You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by "H. Wilson" <wi...@randdss.com> on 2010/06/04 15:21:59 UTC

jcr:contains with wildcards and underscores

Hello,

I am using Jackrabbit 2.0 with OCM and after searching forums both here 
and on Lucene, as well as Google, I have yet to find an answer. (On an 
aside, if this question should have gone to the Lucene user's list, 
please let me know!).

For starters, you should know our clients would like both case-sensitive 
and case-insensitive options available to them. The searches are to be 
on a property named fullName, which may contain underscores and always 
contains a leading dot. (Also our client's requirement.) And while yes, 
we are aware that leading wildcard searches are not the best, the client 
still plans to use them. Here is my issue:

    * My searches using jcr:like work fine for all the scenarios I list
      below.
    * My searches with jcr:contains and exact names work fine (even with
      underscores!).
    * My jcr:contains searches using wildcards and underscores always
      fail. I have even tried escaping them.

Given there are two objects in our repository with the following 
fullName properties:

    .North.South.East.WestLand
    .North.South.East.West_Land


Both of the following work fine, and each return the respective object:

    (jcr:contains(@fullName, '.North.South.East.WestLand'))
    (jcr:contains(@fullName, '.North.South.East.West_Land'))


The following jcr:contains queries return BOTH objects successfully:

    *North*
    .North*
    .North.*

The following queries successfully return the FIRST object:

    *.South.East.WestLand
    .*.South.East.WestLand
    *South*.WestLand
    *East.WestLand
    *.WestLand
    *East?WestLand
    *?WestLand
    *North.South.East.WestLand

And the following identical jcr:contains queries (except the underscore) 
do not return anything, when I would expect the SECOND Object:

    *.South.East.West_Land
    .*.South.East.West_Land
    *South*.West_Land
    *East.West_Land
    *.West_Land
    *East?West_Land
    *?West_Land
    *North.South.East.West_Land

UPDATE: After I wrote this large message, I just remembered something. 
(It should be noted - I have been trying to tackle this off and on for 
weeks, please bear with the slight memory loss, but maybe having seen 
all this will help others.) I remember reading somewhere that Lucene 
treats underscores as token dividers. So when I have Object properties 
with underscores, it is splitting it into Tokens and essentially 
dropping the underscore completely. Which could explain why exact name 
search works. (Is this correct?) The above examples were using the 
StandardAnalyzer. I have previously tried using the WhitespaceAnalyzer, 
but doing so disables my ability to do leading wildcard searches, which 
is absolutely required by our clients. I know there is a way to turn on 
the leading wild card searches, but I could not gather how to do it 
while using JackRabbit. Any advice on a way to use any Analyzer which 
would satisfy our clients would be GREATLY appreciated.

Thanks for your time and patience,
H. Wilson


Re: jcr:contains with wildcards and underscores

Posted by "H. Wilson" <wi...@randdss.com>.
  **For anyone who stumbles into this post with the same problem, head 
on over here ( http://markmail.org/thread/t5hmrob3jdmz7nqm ) for more 
discussion and the solution that ended up working for us.

H. Wilson

On 06/04/2010 09:21 AM, H. Wilson wrote:
> Hello,
>
> I am using Jackrabbit 2.0 with OCM and after searching forums both 
> here and on Lucene, as well as Google, I have yet to find an answer. 
> (On an aside, if this question should have gone to the Lucene user's 
> list, please let me know!).
>
> For starters, you should know our clients would like both 
> case-sensitive and case-insensitive options available to them. The 
> searches are to be on a property named fullName, which may contain 
> underscores and always contains a leading dot. (Also our client's 
> requirement.) And while yes, we are aware that leading wildcard 
> searches are not the best, the client still plans to use them. Here is 
> my issue:
>
>    * My searches using jcr:like work fine for all the scenarios I list
>      below.
>    * My searches with jcr:contains and exact names work fine (even with
>      underscores!).
>    * My jcr:contains searches using wildcards and underscores always
>      fail. I have even tried escaping them.
>
> Given there are two objects in our repository with the following 
> fullName properties:
>
>    .North.South.East.WestLand
>    .North.South.East.West_Land
>
>
> Both of the following work fine, and each return the respective object:
>
>    (jcr:contains(@fullName, '.North.South.East.WestLand'))
>    (jcr:contains(@fullName, '.North.South.East.West_Land'))
>
>
> The following jcr:contains queries return BOTH objects successfully:
>
>    *North*
>    .North*
>    .North.*
>
> The following queries successfully return the FIRST object:
>
>    *.South.East.WestLand
>    .*.South.East.WestLand
>    *South*.WestLand
>    *East.WestLand
>    *.WestLand
>    *East?WestLand
>    *?WestLand
>    *North.South.East.WestLand
>
> And the following identical jcr:contains queries (except the 
> underscore) do not return anything, when I would expect the SECOND 
> Object:
>
>    *.South.East.West_Land
>    .*.South.East.West_Land
>    *South*.West_Land
>    *East.West_Land
>    *.West_Land
>    *East?West_Land
>    *?West_Land
>    *North.South.East.West_Land
>
> UPDATE: After I wrote this large message, I just remembered something. 
> (It should be noted - I have been trying to tackle this off and on for 
> weeks, please bear with the slight memory loss, but maybe having seen 
> all this will help others.) I remember reading somewhere that Lucene 
> treats underscores as token dividers. So when I have Object properties 
> with underscores, it is splitting it into Tokens and essentially 
> dropping the underscore completely. Which could explain why exact name 
> search works. (Is this correct?) The above examples were using the 
> StandardAnalyzer. I have previously tried using the 
> WhitespaceAnalyzer, but doing so disables my ability to do leading 
> wildcard searches, which is absolutely required by our clients. I know 
> there is a way to turn on the leading wild card searches, but I could 
> not gather how to do it while using JackRabbit. Any advice on a way to 
> use any Analyzer which would satisfy our clients would be GREATLY 
> appreciated.
>
> Thanks for your time and patience,
> H. Wilson
>
>