You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Taylor <pa...@fastmail.fm> on 2009/08/22 13:17:43 UTC
Query parser fails on Hangul/Korean
public class Issue3341Test extends TestCase {
public void testMatchHangul() throws Exception {
Analyzer analyzer = new StandardAnalyzer();
RAMDirectory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, analyzer, true,
IndexWriter.MaxFieldLength.LIMITED);
Document doc = new Document();
doc.add(new Field("name", "키드갱", Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(doc);
writer.close();
IndexSearcher searcher = new IndexSearcher(dir,true);
Query q = new QueryParser("name", analyzer).parse("키드갱");
System.out.println(q.toString());
Hits hits = searcher.search(q);
assertEquals(1, hits.length());
}
}
gives the following:
org.apache.lucene.queryParser.ParseException: Cannot parse '???': '*' or
'?' not allowed as first character in WildcardQuery
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:181)
at
org.musicbrainz.search.analysis.Issue3341Test.testMatchHangul(Issue3341Test.java:32)
Why does the parser think its a wildcard.
(I'm just using the standard analyser, because the search could be
performed in any language, but the user doesnt specify the language so
we don't know what analyser to use. But thats okay I dont expect lucene
to do anything clever but I would expect a match when index and query
are identical.)
thanks Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Query parser fails on Hangul/Korean
Posted by Simon Willnauer <si...@googlemail.com>.
Paul, my frist guess would be that your source file encoding is set to
something else than UTF-8. Those characters should be supported by
lucene - none of them are > 16bit so I don't see why this should be
caused by lucene.
I'm pretty sure thats a encoding issues. R u running on windows?!
hope that helps,
simon
On Sat, Aug 22, 2009 at 1:17 PM, Paul Taylor<pa...@fastmail.fm> wrote:
> public class Issue3341Test extends TestCase {
>
> public void testMatchHangul() throws Exception {
> Analyzer analyzer = new StandardAnalyzer();
> RAMDirectory dir = new RAMDirectory();
> IndexWriter writer = new IndexWriter(dir, analyzer, true,
> IndexWriter.MaxFieldLength.LIMITED);
> Document doc = new Document();
> doc.add(new Field("name", "키드갱", Field.Store.YES, Field.Index.ANALYZED));
> writer.addDocument(doc);
> writer.close();
>
> IndexSearcher searcher = new IndexSearcher(dir,true);
> Query q = new QueryParser("name", analyzer).parse("키드갱");
> System.out.println(q.toString());
>
>
> Hits hits = searcher.search(q);
> assertEquals(1, hits.length());
> }
>
> }
>
> gives the following:
>
> org.apache.lucene.queryParser.ParseException: Cannot parse '???': '*' or '?'
> not allowed as first character in WildcardQuery
> at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:181)
> at
> org.musicbrainz.search.analysis.Issue3341Test.testMatchHangul(Issue3341Test.java:32)
>
> Why does the parser think its a wildcard.
> (I'm just using the standard analyser, because the search could be performed
> in any language, but the user doesnt specify the language so we don't know
> what analyser to use. But thats okay I dont expect lucene to do anything
> clever but I would expect a match when index and query are identical.)
>
>
> thanks Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org