You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Ngo, Anh (ISS Southfield)" <an...@iss.net> on 2006/07/21 16:16:09 UTC
StandardAnalyzer question
Hello
The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as a
token. Is there a way I can make StandardAnalyzer don't tokenize for
"_" or any given characters?
I'd like to keep all features that StandardAnalyzer have but want to
modified it a bit for my need? How do I control what character is
tokenizable.
Ex: Test_test1_test2 is my data
StandardAnalyzer: Test test1 test2 my data
I'd like to have: Test_test_test2 my data
Please help.
Thanks,
Anh Ngo
-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
Sent: Wednesday, July 19, 2006 12:25 PM
To: java-user@lucene.apache.org
Subject: Re: BooleanQuery question
: If I search with boolQuery, Lucene doesn't find anything.
: If I modify by hand the query from "+(-(FILE:abstract.htm))
: +(PATH:/bssrs)" to "-(FILE:abstract.htm) +(PATH:/bssrs)", Lucene find
: the correct list of document.
:
: Does somebody know why ?
you can't have a boolean query containing only MUST_NOT clauses (which
is
what (-(FILE:abstract.htm)) is. it matches no documents, so the
mandatory
qualification on it causes the query to fail for all docs.
:
: Thanks in advance,
:
: Nicolas
:
:
:
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: StandardAnalyzer question
Posted by Daniel Naber <lu...@danielnaber.de>.
On Freitag 21 Juli 2006 16:16, Ngo, Anh (ISS Southfield) wrote:
> The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as a
> token. Is there a way I can make StandardAnalyzer don't tokenize for
> "_" or any given characters?
You need to add "_" to the #LETTER definition in StandardTokenizer.jj, then
rebuild StandardTokenizer.java using the appropriate and task.
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org