You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Gary Long <lo...@magillem.com> on 2010/06/25 11:42:02 UTC

How to handle the colon character within fulltext search?

Hello there :)

I'm using the fulltext search feature of Jackrabbit and i'm facing a 
little problem with the colon character (:). For example, if I search 
for a mail which subject is "Tr : Tr : your response", I can't find it. 
If I search for "your response" the e-mail is found.

my sql query is :

SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*tr: tr: your 
response*') OR contains(jcr:name, '*tr: tr: your response*');

Thank you for your help :)

Gary Long



Re: How to handle the colon character within fulltext search?

Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Jun 25, 2010 at 13:59, Gary Long <lo...@magillem.com> wrote:
> I tried this method but it didn't do anything : /
>
> Here is my code :
>
> String param = "Tr: Tr: your response";
> String escapedParam =
> org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(param);
> String query = SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*"+
> escapedParam +"*') OR contains(jcr:name, '*"+ escapedParam +"*').
>
> In debug mode, I looked at the value of textQuery in the query and it is
> still "Tr: Tr your response". The colon character doesn't seems to be
> escaped. : /

Oh, this might be due to the fact of using SQL. I though the method
would work for both cases, as it is essentially an escaping for the
lucene full text search (IIUC).

Something must happen in the SQL query parser then, since
escapeIllegalXpathSearchChars() clearly escapes colons. See
http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-jcr-commons/src/main/java/org/apache/jackrabbit/util/Text.java

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: How to handle the colon character within fulltext search?

Posted by Gary Long <lo...@magillem.com>.
Le 25/06/2010 14:19, Ard Schrijvers a écrit :
> Hello Gary,
>
> in the end, the part in the contains function gets delegated to the
> Lucene QueryParser. So, you can use Lucene query syntax in contains,
> for example query time boosting like 'myterm^10'  (unless it does not
> get swallowed by the xpath/sql parser of jackrabbit, like the ~ fuzzy
> char).
>
> Anyways, a colon means in lucene query parser that you search within a
> specific field, see [1] at *Fields*
>
> At the end of that page, it is explained how to escape special chars ( use \ )
>
> However, prefixing is again with a wildcard does not seem to work when
> I test it: I did not test it directly against lucene, so hard to say
> whether this is a lucene queryparser constraint in combination with
> query expansion for the wildcard or a jackrabbit issue.
>
> That said, I think in the end you do not want to use the prefix
> wildcard anyways: You'll run into terrible performance and memory
> useage problems: A general inverted indexes problem (which you can
> circumvent by indexing every term inverted as well...but that is not
> done by jackrabbit of course)
>
> Anyways, the working solution to your problem is to use 'like'. You
> are not doing a free text search actually (free text is on lucene
> terms, not on sentences)
>
> The xpath equivalent that works is for example:
>
> //*[jcr:like(@myprop, 'my:colon having sentence')]
>
> Though again, the jcr:like has bad scaling wrt performance and memory
>
> Regards Ard
>
> [1] http://lucene.apache.org/java/2_4_0/queryparsersyntax.html
>
> On Fri, Jun 25, 2010 at 1:59 PM, Gary Long<lo...@magillem.com>  wrote:
>    
>> Le 25/06/2010 12:17, Alexander Klimetschek a écrit :
>>      
>>> On Fri, Jun 25, 2010 at 11:42, Gary Long<lo...@magillem.com>    wrote:
>>>
>>>        
>>>> Hello there :)
>>>>
>>>> I'm using the fulltext search feature of Jackrabbit and i'm facing a
>>>> little
>>>> problem with the colon character (:). For example, if I search for a mail
>>>> which subject is "Tr : Tr : your response", I can't find it. If I search
>>>> for
>>>> "your response" the e-mail is found.
>>>>
>>>> my sql query is :
>>>>
>>>> SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*tr: tr: your
>>>> response*') OR contains(jcr:name, '*tr: tr: your response*');
>>>>
>>>>          
>>> You should escape the query for the contains/jcr:contains function
>>> using the Text.escapeIllegalXpathSearchChars helper from
>>> jackrabbit-jcr-commons:
>>>
>>> http://wiki.apache.org/jackrabbit/EncodingAndEscaping#Escaping_values_in_queries
>>>
>>> Regards,
>>> Alex
>>>
>>>
>>>        
>> I tried this method but it didn't do anything : /
>>
>> Here is my code :
>>
>> String param = "Tr: Tr: your response";
>> String escapedParam =
>> org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(param);
>> String query = SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*"+
>> escapedParam +"*') OR contains(jcr:name, '*"+ escapedParam +"*').
>>
>> In debug mode, I looked at the value of textQuery in the query and it is
>> still "Tr: Tr your response". The colon character doesn't seems to be
>> escaped. : /
>>
>> Regards,
>> Gary
>>
>>
>>
>>
>>      

Hello :)

I'll try to use xpath instead of sql to run the query but there is 
something I'm note sure about:  While using xpath, is it possible to 
specify multiple jcr:like or multiple jcr:contains constrains in a 
single query?

I read the documentation on [1] but there is no specific example.

How would you translate the following sql query in xpath :

SELECT * FROM mnt:resource WHERE (contains(jcr:text, 'my:sentence') OR 
contains(jcr:name, 'my:sentence'))
AND jcr:path LIKE '/projects*'
AND jcr:type <> null;

I have the begining : 
/jcr:root/project//element(*,mnt:resource)[jcr:contains(@jcr:text, 
'my:sentence')] ... and I don't know how to write the OR :-\ ?!

Thank you for your help :)

Regards,
Gary

Re: How to handle the colon character within fulltext search?

Posted by Ard Schrijvers <a....@onehippo.com>.
Hello Gary,

in the end, the part in the contains function gets delegated to the
Lucene QueryParser. So, you can use Lucene query syntax in contains,
for example query time boosting like 'myterm^10'  (unless it does not
get swallowed by the xpath/sql parser of jackrabbit, like the ~ fuzzy
char).

Anyways, a colon means in lucene query parser that you search within a
specific field, see [1] at *Fields*

At the end of that page, it is explained how to escape special chars ( use \ )

However, prefixing is again with a wildcard does not seem to work when
I test it: I did not test it directly against lucene, so hard to say
whether this is a lucene queryparser constraint in combination with
query expansion for the wildcard or a jackrabbit issue.

That said, I think in the end you do not want to use the prefix
wildcard anyways: You'll run into terrible performance and memory
useage problems: A general inverted indexes problem (which you can
circumvent by indexing every term inverted as well...but that is not
done by jackrabbit of course)

Anyways, the working solution to your problem is to use 'like'. You
are not doing a free text search actually (free text is on lucene
terms, not on sentences)

The xpath equivalent that works is for example:

//*[jcr:like(@myprop, 'my:colon having sentence')]

Though again, the jcr:like has bad scaling wrt performance and memory

Regards Ard

[1] http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

On Fri, Jun 25, 2010 at 1:59 PM, Gary Long <lo...@magillem.com> wrote:
> Le 25/06/2010 12:17, Alexander Klimetschek a écrit :
>>
>> On Fri, Jun 25, 2010 at 11:42, Gary Long<lo...@magillem.com>  wrote:
>>
>>>
>>> Hello there :)
>>>
>>> I'm using the fulltext search feature of Jackrabbit and i'm facing a
>>> little
>>> problem with the colon character (:). For example, if I search for a mail
>>> which subject is "Tr : Tr : your response", I can't find it. If I search
>>> for
>>> "your response" the e-mail is found.
>>>
>>> my sql query is :
>>>
>>> SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*tr: tr: your
>>> response*') OR contains(jcr:name, '*tr: tr: your response*');
>>>
>>
>> You should escape the query for the contains/jcr:contains function
>> using the Text.escapeIllegalXpathSearchChars helper from
>> jackrabbit-jcr-commons:
>>
>> http://wiki.apache.org/jackrabbit/EncodingAndEscaping#Escaping_values_in_queries
>>
>> Regards,
>> Alex
>>
>>
>
> I tried this method but it didn't do anything : /
>
> Here is my code :
>
> String param = "Tr: Tr: your response";
> String escapedParam =
> org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(param);
> String query = SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*"+
> escapedParam +"*') OR contains(jcr:name, '*"+ escapedParam +"*').
>
> In debug mode, I looked at the value of textQuery in the query and it is
> still "Tr: Tr your response". The colon character doesn't seems to be
> escaped. : /
>
> Regards,
> Gary
>
>
>
>

Re: How to handle the colon character within fulltext search?

Posted by Gary Long <lo...@magillem.com>.
Le 25/06/2010 12:17, Alexander Klimetschek a écrit :
> On Fri, Jun 25, 2010 at 11:42, Gary Long<lo...@magillem.com>  wrote:
>    
>> Hello there :)
>>
>> I'm using the fulltext search feature of Jackrabbit and i'm facing a little
>> problem with the colon character (:). For example, if I search for a mail
>> which subject is "Tr : Tr : your response", I can't find it. If I search for
>> "your response" the e-mail is found.
>>
>> my sql query is :
>>
>> SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*tr: tr: your
>> response*') OR contains(jcr:name, '*tr: tr: your response*');
>>      
> You should escape the query for the contains/jcr:contains function
> using the Text.escapeIllegalXpathSearchChars helper from
> jackrabbit-jcr-commons:
> http://wiki.apache.org/jackrabbit/EncodingAndEscaping#Escaping_values_in_queries
>
> Regards,
> Alex
>
>    
I tried this method but it didn't do anything : /

Here is my code :

String param = "Tr: Tr: your response";
String escapedParam = 
org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(param);
String query = SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*"+ 
escapedParam +"*') OR contains(jcr:name, '*"+ escapedParam +"*').

In debug mode, I looked at the value of textQuery in the query and it is 
still "Tr: Tr your response". The colon character doesn't seems to be 
escaped. : /

Regards,
Gary




Re: How to handle the colon character within fulltext search?

Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Jun 25, 2010 at 11:42, Gary Long <lo...@magillem.com> wrote:
> Hello there :)
>
> I'm using the fulltext search feature of Jackrabbit and i'm facing a little
> problem with the colon character (:). For example, if I search for a mail
> which subject is "Tr : Tr : your response", I can't find it. If I search for
> "your response" the e-mail is found.
>
> my sql query is :
>
> SELECT * FROM mnt:resource WHERE (contains(jcr:text, '*tr: tr: your
> response*') OR contains(jcr:name, '*tr: tr: your response*');

You should escape the query for the contains/jcr:contains function
using the Text.escapeIllegalXpathSearchChars helper from
jackrabbit-jcr-commons:
http://wiki.apache.org/jackrabbit/EncodingAndEscaping#Escaping_values_in_queries

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com