You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by KÖLL Claus <C....@TIROL.GV.AT> on 2007/11/29 16:43:27 UTC
FullText Search Problem
hi users,
i want to make a fulltext search like this ...
/jcr:root/tirolgvat[1]//element(*, nt:base)[jcr:contains(., 'test!')]
then i get this exception...
javax.jcr.RepositoryException: Exception building query: org.apache.jackrabbit.core.query.lucene.fulltext.ParseException: Encountered "<EOF>" at line 1, column 6.
i know the problem is the "!" sign
i tried to encode it first with the ISO9075 Class but then the query works but i get no results
any hints are welcome :-)
BR,
claus
RE: FullText Search Problem
Posted by Ard Schrijvers <a....@hippo.nl>.
Hello,
>
> hi ,
>
> thanks for the informations ...
> it will be fine if someone else jump in and looks if this is
> a bug i will not achieve something special
I alreadt looked into it, and it is a bug. It only happens for '!' at
the end of a word. Can you file a JIRA issue for it?
> .. the exception
> comes from daily work somebody tries to search for this and
> reported me the exception i know that >>1) 'test!' is equal to 'test'
> but the endusers not :-)
acceptee :-)
>
>
> so either i will filter some characters from the search
> string or jackrabbit should handle it.
> i think the second one will be better
Think so too
Ard
>
> BR,
> claus
>
AW: FullText Search Problem
Posted by KÖLL Claus <C....@TIROL.GV.AT>.
hi thomas,
thanks for the info .. this works also fine
but as described in
https://issues.apache.org/jira/browse/JCR-1248
i think this should handle jackrabbit by a util class internal that we get no exception
BR,
claus
-----Ursprüngliche Nachricht-----
Von: Thomas Mueller [mailto:thomas.tom.mueller@gmail.com]
Gesendet: Montag, 03. Dezember 2007 14:08
An: users@jackrabbit.apache.org
Betreff: Re: FullText Search Problem
Hi,
> > //element(*, nt:base)[jcr:contains(., 'test\!')]
> '\!' is not a valid escape sequenece in java ....
Try "//element(*, nt:base)[jcr:contains(., 'test\\!')]";
Regards,
Thomas
Re: FullText Search Problem
Posted by Thomas Mueller <th...@gmail.com>.
Hi,
> > //element(*, nt:base)[jcr:contains(., 'test\!')]
> '\!' is not a valid escape sequenece in java ....
Try "//element(*, nt:base)[jcr:contains(., 'test\\!')]";
Regards,
Thomas
AW: FullText Search Problem
Posted by KÖLL Claus <C....@TIROL.GV.AT>.
hi marcel,
> //element(*, nt:base)[jcr:contains(., 'test\!')]
'\!' is not a valid escape sequenece in java ....
BR,
claus
-----Ursprüngliche Nachricht-----
Von: Marcel Reutegger [mailto:marcel.reutegger@gmx.net]
Gesendet: Freitag, 30. November 2007 15:33
An: users@jackrabbit.apache.org
Betreff: Re: AW: AW: FullText Search Problem
KÖLL Claus wrote:
> thanks for the informations
> can you add your comments to the jira issue ?
> https://issues.apache.org/jira/browse/JCR-1248
sure.
> ok if try to run the query like this
>
> //element(*, nt:base)[jcr:contains(., 'test\"!\"')]"
>
> it works fine
hmm, why did you add the double quotes? I think this should be sufficent:
//element(*, nt:base)[jcr:contains(., 'test\!')]
regards
marcel
Re: AW: AW: FullText Search Problem
Posted by Marcel Reutegger <ma...@gmx.net>.
KÖLL Claus wrote:
> thanks for the informations
> can you add your comments to the jira issue ?
> https://issues.apache.org/jira/browse/JCR-1248
sure.
> ok if try to run the query like this
>
> //element(*, nt:base)[jcr:contains(., 'test\"!\"')]"
>
> it works fine
hmm, why did you add the double quotes? I think this should be sufficent:
//element(*, nt:base)[jcr:contains(., 'test\!')]
regards
marcel
AW: AW: FullText Search Problem
Posted by KÖLL Claus <C....@TIROL.GV.AT>.
hi marcel,
thanks for the informations
can you add your comments to the jira issue ?
https://issues.apache.org/jira/browse/JCR-1248
ok if try to run the query like this
//element(*, nt:base)[jcr:contains(., 'test\"!\"')]"
it works fine
but i think jackrabbit should handle the query properly if the sign is at the end ..
>>What I propose is to limit the set to only those that are really required. e.g.
>>the "!" is equivalent to "-" and the keyword NOT. And then clearly document it.
yes the cleary documenttation is often the problem :-)
>>This however means that you need to escape more than the specified set of
>>characters.
should we add a UtilClass that handles this kind of escaping because we have ISO9075 that
handles filenames and ISO8601 that handles date/time things so it would be fine
to encode search literals also
BR,
claus
-----Ursprüngliche Nachricht-----
Von: Marcel Reutegger [mailto:marcel.reutegger@gmx.net]
Gesendet: Freitag, 30. November 2007 11:11
An: users@jackrabbit.apache.org
Betreff: Re: AW: FullText Search Problem
KÖLL Claus wrote:
> so either i will filter some characters from the search string or jackrabbit should handle it.
> i think the second one will be better
JSR 170 specifies a set of characters that need to be escaped if one wishes to
use them as literal instead of the semantics the spec gives them:
"Within the searchexp literal instances of single quote ("'"), double quote
(""") and hyphen ("-") must be escaped with a backslash ("\"). Backslash itself
must therefore also be escaped, ending up as double backslash ("\\")."
Jackrabbit extended this set to provide additional functionality. e.g. you can
do a fuzzy search: test~
This however means that you need to escape more than the specified set of
characters. Strictly speaking this is a violation of the spec. But without
extending this set of characters additional functionality is very difficult to
implement.
The current set of special characters that need escaping is:
"\\", "+", "-", "!", "(", ")", ":", "^", "[", "]", "\"", "{", "}", "~", "*", "?"
What I propose is to limit the set to only those that are really required. e.g.
the "!" is equivalent to "-" and the keyword NOT. And then clearly document it.
regards
marcel
Re: AW: FullText Search Problem
Posted by Marcel Reutegger <ma...@gmx.net>.
KÖLL Claus wrote:
> so either i will filter some characters from the search string or jackrabbit should handle it.
> i think the second one will be better
JSR 170 specifies a set of characters that need to be escaped if one wishes to
use them as literal instead of the semantics the spec gives them:
"Within the searchexp literal instances of single quote (“'”), double quote
(“"”) and hyphen (“-”) must be escaped with a backslash (“\”). Backslash itself
must therefore also be escaped, ending up as double backslash (“\\”)."
Jackrabbit extended this set to provide additional functionality. e.g. you can
do a fuzzy search: test~
This however means that you need to escape more than the specified set of
characters. Strictly speaking this is a violation of the spec. But without
extending this set of characters additional functionality is very difficult to
implement.
The current set of special characters that need escaping is:
"\\", "+", "-", "!", "(", ")", ":", "^", "[", "]", "\"", "{", "}", "~", "*", "?"
What I propose is to limit the set to only those that are really required. e.g.
the "!" is equivalent to "-" and the keyword NOT. And then clearly document it.
regards
marcel
AW: FullText Search Problem
Posted by KÖLL Claus <C....@TIROL.GV.AT>.
hi ,
thanks for the informations ...
it will be fine if someone else jump in and looks if this is a bug
i will not achieve something special .. the exception comes from daily work
somebody tries to search for this and reported me the exception
i know that >>1) 'test!' is equal to 'test'
but the endusers not :-)
so either i will filter some characters from the search string or jackrabbit should handle it.
i think the second one will be better
BR,
claus
-----Ursprüngliche Nachricht-----
Von: Ard Schrijvers [mailto:a.schrijvers@hippo.nl]
Gesendet: Donnerstag, 29. November 2007 20:40
An: users@jackrabbit.apache.org
Betreff: RE: FullText Search Problem
> hi users,
>
> i want to make a fulltext search like this ...
> /jcr:root/tirolgvat[1]//element(*, nt:base)[jcr:contains(., 'test!')]
>
> then i get this exception...
> javax.jcr.RepositoryException: Exception building query:
> org.apache.jackrabbit.core.query.lucene.fulltext.ParseExceptio
> n: Encountered "<EOF>" at line 1, column 6.
Yes, you are correct. It seems that in LuceneQueryBuilder at
Object visit(TextsearchQueryNode node, Object data) {
it breaks at
Query context = parser.parse(query.toString());
where the parser is o.a.j.core.query.lucene.fulltext.QueryParser. It
seems to break on string ending with a "!". Unfortunately, I do not have
insight in how the QueryParser works. Perhaps somebody else knows where
to look in the QueryParser .
OTOH, beside that this is possibly a bug, what are you trying to achieve
with your query? "jcr:contains(., 'test!')", even when it would not
break will simple return the same as "jcr:contains(., 'test')". This is
because the query is eventually parsed with a lucene analyzer, and
string are tokenized on "!" (at least if your analyzer sees ! as a
delimiter , which the default analyzer in jackrabbit does, which you are
probably using). So assuming you use
org.apache.lucene.analysis.standard.StandardAnalyzer (see [1] workspace
config)
1) 'test!' is equal to 'test'
2) 'te!st' is equal to 'te' OR 'st' (the or is depending on default OR
or AND setting though)
3) 'te#st' is equal to 'te' OR 'st'
You might think it is strange, but you have to realize that you text is
also indexed with this same analyzer.
Hope it is clear,'
Regards Ard
[1] http://jackrabbit.apache.org/doc/config.html
>
> i know the problem is the "!" sign
>
> i tried to encode it first with the ISO9075 Class but then
> the query works but i get no results
>
> any hints are welcome :-)
>
> BR,
> claus
>
>
RE: FullText Search Problem
Posted by Ard Schrijvers <a....@hippo.nl>.
> hi users,
>
> i want to make a fulltext search like this ...
> /jcr:root/tirolgvat[1]//element(*, nt:base)[jcr:contains(., 'test!')]
>
> then i get this exception...
> javax.jcr.RepositoryException: Exception building query:
> org.apache.jackrabbit.core.query.lucene.fulltext.ParseExceptio
> n: Encountered "<EOF>" at line 1, column 6.
Yes, you are correct. It seems that in LuceneQueryBuilder at
Object visit(TextsearchQueryNode node, Object data) {
it breaks at
Query context = parser.parse(query.toString());
where the parser is o.a.j.core.query.lucene.fulltext.QueryParser. It
seems to break on string ending with a "!". Unfortunately, I do not have
insight in how the QueryParser works. Perhaps somebody else knows where
to look in the QueryParser .
OTOH, beside that this is possibly a bug, what are you trying to achieve
with your query? "jcr:contains(., 'test!')", even when it would not
break will simple return the same as "jcr:contains(., 'test')". This is
because the query is eventually parsed with a lucene analyzer, and
string are tokenized on "!" (at least if your analyzer sees ! as a
delimiter , which the default analyzer in jackrabbit does, which you are
probably using). So assuming you use
org.apache.lucene.analysis.standard.StandardAnalyzer (see [1] workspace
config)
1) 'test!' is equal to 'test'
2) 'te!st' is equal to 'te' OR 'st' (the or is depending on default OR
or AND setting though)
3) 'te#st' is equal to 'te' OR 'st'
You might think it is strange, but you have to realize that you text is
also indexed with this same analyzer.
Hope it is clear,'
Regards Ard
[1] http://jackrabbit.apache.org/doc/config.html
>
> i know the problem is the "!" sign
>
> i tried to encode it first with the ISO9075 Class but then
> the query works but i get no results
>
> any hints are welcome :-)
>
> BR,
> claus
>
>