You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Robert Haycock <Ro...@artificial-solutions.com> on 2012/12/14 18:33:38 UTC

Search - contains with wildcard and UUIDs and dashes

Hi,

I have a node with 2 properties,

-          String::name = "bit of everything"

-          UUID::id = "fadeb7f3-224c-48e5-a92f-ca6e1939fa3b"
which are persisted as string node properties.

The following SQL searches return the node...

WHERE CONTAINS(document.name, 'everything')
WHERE CONTAINS(document.name, 'everything*')
WHERE CONTAINS(document.name, '*everything')
WHERE CONTAINS(document.name, '*everything*')
WHERE CONTAINS(document.id, 'fadeb7f3-224c-48e5-a92f-ca6e1939fa3b')
WHERE CONTAINS(document.id, 'fadeb7f3\-224c\-48e5\-a92f\-ca6e1939fa3b')

These don't work...

WHERE CONTAINS(document.id, 'fadeb7f3-224c-48e5-a92f-ca6e1939fa3b*')  // Note the wildcard on the end
WHERE CONTAINS(document.id, 'fadeb7f3*')

Please could someone shed some light on why this doesn't work and how I can make it work. I realise there is no reason for trying to do a wildcard search on a UUID but there comes a time when we just do what we're told!!

Thanks,

Rob.

Re: Search - contains with wildcard and UUIDs and dashes

Posted by joe verderber <jj...@gmail.com>.
sounds less like a bug and more like behavior of a UID data type, that
would also be case insensitive take for example you UID's in SQL database
commands

On Monday, December 17, 2012, Robert Haycock wrote:

> The fact is it finds the node when searching with the whole ID and no
> wildcard but doesn't work when you add a wildcard.
>
> That in my eyes is clearly a bug.
>
> -----Original Message-----
> From: Alexander Klimetschek [mailto:aklimets@adobe.com <javascript:;>]
> Sent: 14 December 2012 18:54
> To: users@jackrabbit.apache.org <javascript:;>
> Subject: Re: Search - contains with wildcard and UUIDs and dashes
>
> On 14.12.2012, at 18:33, Robert Haycock <
> Robert.Haycock@artificial-solutions.com <javascript:;>> wrote:
>
> > These don't work...
> >
> > WHERE CONTAINS(document.id, 'fadeb7f3-224c-48e5-a92f-ca6e1939fa3b*')
> > // Note the wildcard on the end WHERE CONTAINS(document.id,
> > 'fadeb7f3*')
> >
> > Please could someone shed some light on why this doesn't work and how I
> can make it work. I realise there is no reason for trying to do a wildcard
> search on a UUID but there comes a time when we just do what we're told!!
>
> CONTAINS() does a full text search, which doesn't work well for fixed "ID"
> strings, but is meant for free form human text search. It is subject to
> word splitting, stemming etc.
>
> You probably want to do a LIKE (which uses % as the * wildcard char):
>
> WHERE document.id LIKE 'fadeb7f3%'
>
> [0] http://www.day.com/specs/jcr/1.0/8.5.4.4_LIKE.html
>
> Cheers,
> Alex
>


-- 
Sent from Gmail Mobile

RE: Search - contains with wildcard and UUIDs and dashes

Posted by Robert Haycock <Ro...@artificial-solutions.com>.
Because this is for a user searching on the repository. They can search on all properties. I don't want to have to write a regular expression to see if the user is searching for a UUID just so I can tailor the search.

-----Original Message-----
From: Alexander Klimetschek [mailto:aklimets@adobe.com] 
Sent: 17 December 2012 20:06
To: users@jackrabbit.apache.org
Subject: Re: Search - contains with wildcard and UUIDs and dashes

On 17.12.2012, at 18:02, Robert Haycock <Ro...@artificial-solutions.com> wrote:

> I don't follow you. I'm just storing the id as a string.

Then why do you want to use full text search on it?

Cheers,
Alex

Re: Search - contains with wildcard and UUIDs and dashes

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 17.12.2012, at 18:02, Robert Haycock <Ro...@artificial-solutions.com> wrote:

> I don't follow you. I'm just storing the id as a string.

Then why do you want to use full text search on it?

Cheers,
Alex

RE: Search - contains with wildcard and UUIDs and dashes

Posted by Robert Haycock <Ro...@artificial-solutions.com>.
I don't follow you. I'm just storing the id as a string.

Even with stemming etc. I should still be able to search on the first few characters followed by a wild card shouldn't I? eg. "abc*"

Wouldn't aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee be indexed as either "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee" or "aaaaaaaa", "bbbb", "cccc", "dddd" and "eeeeeeeeeeee"? In both cases I should be able to search on "aaaaa*" and get a hit.

-----Original Message-----
From: Alexander Klimetschek [mailto:aklimets@adobe.com] 
Sent: 17 December 2012 16:33
To: users@jackrabbit.apache.org
Subject: Re: Search - contains with wildcard and UUIDs and dashes

On 17.12.2012, at 15:38, Robert Haycock <Ro...@artificial-solutions.com> wrote:

> The fact is it finds the node when searching with the whole ID and no wildcard but doesn't work when you add a wildcard.
> 
> That in my eyes is clearly a bug.

As I mentioned, contains() is full text search and is never to be expected to work with formal identifiers. Using wildcards makes it even more "fuzzy".

Cheers,
Alex


Re: Search - contains with wildcard and UUIDs and dashes

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 17.12.2012, at 15:38, Robert Haycock <Ro...@artificial-solutions.com> wrote:

> The fact is it finds the node when searching with the whole ID and no wildcard but doesn't work when you add a wildcard.
> 
> That in my eyes is clearly a bug.

As I mentioned, contains() is full text search and is never to be expected to work with formal identifiers. Using wildcards makes it even more "fuzzy".

Cheers,
Alex


RE: Search - contains with wildcard and UUIDs and dashes

Posted by Robert Haycock <Ro...@artificial-solutions.com>.
The fact is it finds the node when searching with the whole ID and no wildcard but doesn't work when you add a wildcard.

That in my eyes is clearly a bug.

-----Original Message-----
From: Alexander Klimetschek [mailto:aklimets@adobe.com] 
Sent: 14 December 2012 18:54
To: users@jackrabbit.apache.org
Subject: Re: Search - contains with wildcard and UUIDs and dashes

On 14.12.2012, at 18:33, Robert Haycock <Ro...@artificial-solutions.com> wrote:

> These don't work...
> 
> WHERE CONTAINS(document.id, 'fadeb7f3-224c-48e5-a92f-ca6e1939fa3b*')  
> // Note the wildcard on the end WHERE CONTAINS(document.id, 
> 'fadeb7f3*')
> 
> Please could someone shed some light on why this doesn't work and how I can make it work. I realise there is no reason for trying to do a wildcard search on a UUID but there comes a time when we just do what we're told!!

CONTAINS() does a full text search, which doesn't work well for fixed "ID" strings, but is meant for free form human text search. It is subject to word splitting, stemming etc.

You probably want to do a LIKE (which uses % as the * wildcard char):

WHERE document.id LIKE 'fadeb7f3%'

[0] http://www.day.com/specs/jcr/1.0/8.5.4.4_LIKE.html

Cheers,
Alex

Re: Search - contains with wildcard and UUIDs and dashes

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 14.12.2012, at 18:33, Robert Haycock <Ro...@artificial-solutions.com> wrote:

> These don't work...
> 
> WHERE CONTAINS(document.id, 'fadeb7f3-224c-48e5-a92f-ca6e1939fa3b*')  // Note the wildcard on the end
> WHERE CONTAINS(document.id, 'fadeb7f3*')
> 
> Please could someone shed some light on why this doesn't work and how I can make it work. I realise there is no reason for trying to do a wildcard search on a UUID but there comes a time when we just do what we're told!!

CONTAINS() does a full text search, which doesn't work well for fixed "ID" strings, but is meant for free form human text search. It is subject to word splitting, stemming etc.

You probably want to do a LIKE (which uses % as the * wildcard char):

WHERE document.id LIKE 'fadeb7f3%'

[0] http://www.day.com/specs/jcr/1.0/8.5.4.4_LIKE.html

Cheers,
Alex