You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by KÖLL Claus <C....@TIROL.GV.AT> on 2007/11/29 16:43:27 UTC

FullText Search Problem

hi users,

i want to make a fulltext search like this ...
/jcr:root/tirolgvat[1]//element(*, nt:base)[jcr:contains(., 'test!')]

then i get this exception...
javax.jcr.RepositoryException: Exception building query: org.apache.jackrabbit.core.query.lucene.fulltext.ParseException: Encountered "<EOF>" at line 1, column 6.

i know the problem is the "!" sign

i tried to encode it first with the ISO9075 Class but then the query works but i get no results

any hints are welcome :-)

BR,
claus


RE: FullText Search Problem

Posted by Ard Schrijvers <a....@hippo.nl>.
Hello,
> 
> hi ,
> 
> thanks for the informations ...
> it will be fine if someone else jump in and looks if this is 
> a bug i will not achieve something special 

I alreadt looked into it, and it is a bug. It only happens for '!' at
the end of a word. Can you file a JIRA issue for it?

> .. the exception 
> comes from daily work somebody tries to search for this and 
> reported me the exception i know that  >>1) 'test!' is equal to 'test'
> but the endusers not :-)

acceptee :-) 

> 
> 
> so either i will filter some characters from the search 
> string or jackrabbit should handle it.
> i think the second one will be better

Think so too

Ard

> 
> BR,
> claus
> 

AW: FullText Search Problem

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
hi thomas,

thanks for the info .. this works also fine
but as described in 
https://issues.apache.org/jira/browse/JCR-1248
i think this should handle jackrabbit by a util class internal that we get no exception

BR,
claus

-----Ursprüngliche Nachricht-----
Von: Thomas Mueller [mailto:thomas.tom.mueller@gmail.com] 
Gesendet: Montag, 03. Dezember 2007 14:08
An: users@jackrabbit.apache.org
Betreff: Re: FullText Search Problem


Hi,

> > //element(*, nt:base)[jcr:contains(., 'test\!')]
> '\!' is not a valid escape sequenece in java ....

Try "//element(*, nt:base)[jcr:contains(., 'test\\!')]";

Regards,
Thomas

Re: FullText Search Problem

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

> > //element(*, nt:base)[jcr:contains(., 'test\!')]
> '\!' is not a valid escape sequenece in java ....

Try "//element(*, nt:base)[jcr:contains(., 'test\\!')]";

Regards,
Thomas

AW: FullText Search Problem

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
hi marcel,

> //element(*, nt:base)[jcr:contains(., 'test\!')]

'\!' is not a valid escape sequenece in java ....

BR,
claus


-----Ursprüngliche Nachricht-----
Von: Marcel Reutegger [mailto:marcel.reutegger@gmx.net] 
Gesendet: Freitag, 30. November 2007 15:33
An: users@jackrabbit.apache.org
Betreff: Re: AW: AW: FullText Search Problem


KÖLL Claus wrote:
> thanks for the informations
> can you add your comments to the jira issue ?
> https://issues.apache.org/jira/browse/JCR-1248

sure.

> ok if try to run the query like this
> 
> //element(*, nt:base)[jcr:contains(., 'test\"!\"')]"
> 
> it works fine

hmm, why did you add the double quotes? I think this should be sufficent:

//element(*, nt:base)[jcr:contains(., 'test\!')]

regards
  marcel

Re: AW: AW: FullText Search Problem

Posted by Marcel Reutegger <ma...@gmx.net>.
KÖLL Claus wrote:
> thanks for the informations
> can you add your comments to the jira issue ?
> https://issues.apache.org/jira/browse/JCR-1248

sure.

> ok if try to run the query like this
> 
> //element(*, nt:base)[jcr:contains(., 'test\"!\"')]"
> 
> it works fine

hmm, why did you add the double quotes? I think this should be sufficent:

//element(*, nt:base)[jcr:contains(., 'test\!')]

regards
  marcel

AW: AW: FullText Search Problem

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
hi marcel,

thanks for the informations
can you add your comments to the jira issue ?
https://issues.apache.org/jira/browse/JCR-1248

ok if try to run the query like this

//element(*, nt:base)[jcr:contains(., 'test\"!\"')]"

it works fine

but i think jackrabbit should handle the query properly if the sign is at the end ..

>>What I propose is to limit the set to only those that are really required. e.g. 
>>the "!" is equivalent to "-" and the keyword NOT. And then clearly document it.

yes the cleary documenttation is often the problem :-)

>>This however means that you need to escape more than the specified set of 
>>characters.

should we add a UtilClass that handles this kind of escaping because we have ISO9075 that
handles filenames and ISO8601 that handles date/time things so  it would be fine
to encode search literals also

BR,
claus


-----Ursprüngliche Nachricht-----
Von: Marcel Reutegger [mailto:marcel.reutegger@gmx.net] 
Gesendet: Freitag, 30. November 2007 11:11
An: users@jackrabbit.apache.org
Betreff: Re: AW: FullText Search Problem


KÖLL Claus wrote:
> so either i will filter some characters from the search string or jackrabbit should handle it.
> i think the second one will be better

JSR 170 specifies a set of characters that need to be escaped if one wishes to 
use them as literal instead of the semantics the spec gives them:

"Within the searchexp literal instances of single quote ("'"), double quote 
(""") and hyphen ("-") must be escaped with a backslash ("\"). Backslash itself 
must therefore also be escaped, ending up as double backslash ("\\")."

Jackrabbit extended this set to provide additional functionality. e.g. you can 
do a fuzzy search: test~

This however means that you need to escape more than the specified set of 
characters. Strictly speaking this is a violation of the spec. But without 
extending this set of characters additional functionality is very difficult to 
implement.

The current set of special characters that need escaping is:

"\\", "+", "-", "!", "(", ")", ":", "^", "[", "]", "\"", "{", "}", "~", "*", "?"

What I propose is to limit the set to only those that are really required. e.g. 
the "!" is equivalent to "-" and the keyword NOT. And then clearly document it.

regards
  marcel

Re: AW: FullText Search Problem

Posted by Marcel Reutegger <ma...@gmx.net>.
KÖLL Claus wrote:
> so either i will filter some characters from the search string or jackrabbit should handle it.
> i think the second one will be better

JSR 170 specifies a set of characters that need to be escaped if one wishes to 
use them as literal instead of the semantics the spec gives them:

"Within the searchexp literal instances of single quote (“'”), double quote 
(“"”) and hyphen (“-”) must be escaped with a backslash (“\”). Backslash itself 
must therefore also be escaped, ending up as double backslash (“\\”)."

Jackrabbit extended this set to provide additional functionality. e.g. you can 
do a fuzzy search: test~

This however means that you need to escape more than the specified set of 
characters. Strictly speaking this is a violation of the spec. But without 
extending this set of characters additional functionality is very difficult to 
implement.

The current set of special characters that need escaping is:

"\\", "+", "-", "!", "(", ")", ":", "^", "[", "]", "\"", "{", "}", "~", "*", "?"

What I propose is to limit the set to only those that are really required. e.g. 
the "!" is equivalent to "-" and the keyword NOT. And then clearly document it.

regards
  marcel

AW: FullText Search Problem

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
hi ,

thanks for the informations ...
it will be fine if someone else jump in and looks if this is a bug 
i will not achieve something special .. the exception comes from daily work
somebody tries to search for this and reported me the exception
i know that  >>1) 'test!' is equal to 'test'
but the endusers not :-)


so either i will filter some characters from the search string or jackrabbit should handle it.
i think the second one will be better

BR,
claus

-----Ursprüngliche Nachricht-----
Von: Ard Schrijvers [mailto:a.schrijvers@hippo.nl] 
Gesendet: Donnerstag, 29. November 2007 20:40
An: users@jackrabbit.apache.org
Betreff: RE: FullText Search Problem




> hi users,
> 
> i want to make a fulltext search like this ...
> /jcr:root/tirolgvat[1]//element(*, nt:base)[jcr:contains(., 'test!')]
> 
> then i get this exception...
> javax.jcr.RepositoryException: Exception building query: 
> org.apache.jackrabbit.core.query.lucene.fulltext.ParseExceptio
> n: Encountered "<EOF>" at line 1, column 6.

Yes, you are correct. It seems that in LuceneQueryBuilder at

Object visit(TextsearchQueryNode node, Object data) {

it breaks at

Query context = parser.parse(query.toString()); 

where the parser is o.a.j.core.query.lucene.fulltext.QueryParser. It
seems to break on string ending with a "!". Unfortunately, I do not have
insight in how the QueryParser works. Perhaps somebody else knows where
to look in the QueryParser .

OTOH, beside that this is possibly a bug, what are you trying to achieve
with your query? "jcr:contains(., 'test!')", even when it would not
break will simple return the same as "jcr:contains(., 'test')". This is
because the query is eventually parsed with a lucene analyzer, and
string are tokenized on "!" (at least if your analyzer sees ! as a
delimiter , which the default analyzer in jackrabbit does, which you are
probably using). So assuming you use
org.apache.lucene.analysis.standard.StandardAnalyzer (see [1] workspace
config)

1) 'test!' is equal to 'test' 
2) 'te!st' is equal to 'te' OR 'st' (the or is depending on default OR
or AND setting though)
3) 'te#st' is equal to 'te' OR 'st'

You might think it is strange, but you have to realize that you text is
also indexed with this same analyzer.

Hope it is clear,'

Regards Ard

[1] http://jackrabbit.apache.org/doc/config.html

> 
> i know the problem is the "!" sign
> 
> i tried to encode it first with the ISO9075 Class but then 
> the query works but i get no results
> 
> any hints are welcome :-)
> 
> BR,
> claus
> 
> 

RE: FullText Search Problem

Posted by Ard Schrijvers <a....@hippo.nl>.

> hi users,
> 
> i want to make a fulltext search like this ...
> /jcr:root/tirolgvat[1]//element(*, nt:base)[jcr:contains(., 'test!')]
> 
> then i get this exception...
> javax.jcr.RepositoryException: Exception building query: 
> org.apache.jackrabbit.core.query.lucene.fulltext.ParseExceptio
> n: Encountered "<EOF>" at line 1, column 6.

Yes, you are correct. It seems that in LuceneQueryBuilder at

Object visit(TextsearchQueryNode node, Object data) {

it breaks at

Query context = parser.parse(query.toString()); 

where the parser is o.a.j.core.query.lucene.fulltext.QueryParser. It
seems to break on string ending with a "!". Unfortunately, I do not have
insight in how the QueryParser works. Perhaps somebody else knows where
to look in the QueryParser .

OTOH, beside that this is possibly a bug, what are you trying to achieve
with your query? "jcr:contains(., 'test!')", even when it would not
break will simple return the same as "jcr:contains(., 'test')". This is
because the query is eventually parsed with a lucene analyzer, and
string are tokenized on "!" (at least if your analyzer sees ! as a
delimiter , which the default analyzer in jackrabbit does, which you are
probably using). So assuming you use
org.apache.lucene.analysis.standard.StandardAnalyzer (see [1] workspace
config)

1) 'test!' is equal to 'test' 
2) 'te!st' is equal to 'te' OR 'st' (the or is depending on default OR
or AND setting though)
3) 'te#st' is equal to 'te' OR 'st'

You might think it is strange, but you have to realize that you text is
also indexed with this same analyzer.

Hope it is clear,'

Regards Ard

[1] http://jackrabbit.apache.org/doc/config.html

> 
> i know the problem is the "!" sign
> 
> i tried to encode it first with the ISO9075 Class but then 
> the query works but i get no results
> 
> any hints are welcome :-)
> 
> BR,
> claus
> 
>