You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by PERRIN GOURON Olivier <ol...@xml-ais.com> on 2002/10/31 16:26:02 UTC
RE: problem with non latin characters in the query
Hello,
I am using Lucene to index UTF-16 and UTF-8 files . Those files are
trans-encoded to the right format so that they can be indexable with Lucene.
The index is searched through with queries made from an UTF-16 file.
Everything works fine as long my query file contains latin characters (even
specific french chars such as éàèêoe...)
Problems occur when the UTF-16 query file contains not latin characters. I
have tried russian characters, such as ?, which is \u0418, but Lucene sends
me this error:
Exception in thread "main"
org.apache.lucene.queryParser.TokenMgrError: Lexical
error at line 1, column 8. Encountered: "\u0018" (24), after : ""
at
org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknown
Source)
at org.apache.lucene.queryParser.QueryParser.jj_scan_token(Unknown
Source)
at org.apache.lucene.queryParser.QueryParser.jj_3_1(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.jj_2_1(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.Clause(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
at org.apache.lucene.CherchLeTex.main(CherchLeTex.java:51)
Seems that the queryParser doesn't use the right code for ?... I have tried
greek, and I does the same.
Is it due to the analyser? I don't think so since I changed my
StandardAnalyser for the FrenchAnalyser, and it still behaves the same
to the query parser?...
Another problem that gives exactly the same error message occurs when a
world in my query starts whith a local character (éàèêoe...). This is weird,
since local characters do not trigger errors when they are in the middle of
the world.
Have you ever met this problem? I would appreciate your help and advices
Thanks for your consideration
Olivier Perrin-Gouron
AIS Berger-Levrault
--
To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
For additional commands, e-mail:
<ma...@jakarta.apache.org>
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>