You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Ngo, Anh (ISS Southfield)" <an...@iss.net> on 2006/07/21 16:16:09 UTC

StandardAnalyzer question

Hello

The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as a
token.  Is there a way I can make StandardAnalyzer don't tokenize for
"_" or any given characters?

I'd like to keep all features that StandardAnalyzer have but want to
modified it a bit for my need? How do I control what character is
tokenizable.

Ex: Test_test1_test2 is my data
StandardAnalyzer: Test test1 test2 my data
I'd like to have:  Test_test_test2 my data


Please help.


Thanks,


Anh Ngo


-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Wednesday, July 19, 2006 12:25 PM
To: java-user@lucene.apache.org
Subject: Re: BooleanQuery question


: If  I search with boolQuery, Lucene doesn't find anything.
: If I modify by hand the query from "+(-(FILE:abstract.htm))
: +(PATH:/bssrs)" to "-(FILE:abstract.htm) +(PATH:/bssrs)", Lucene find
: the correct list of document.
:
: Does somebody know why ?

you can't have a boolean query containing only MUST_NOT clauses (which
is
what (-(FILE:abstract.htm)) is.  it matches no documents, so the
mandatory
qualification on it causes the query to fail for all docs.


:
: Thanks in advance,
:
: Nicolas
:
:
:
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: StandardAnalyzer question

Posted by Daniel Naber <lu...@danielnaber.de>.

On Freitag 21 Juli 2006 16:16, Ngo, Anh (ISS Southfield) wrote:

> The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as a
> token.  Is there a way I can make StandardAnalyzer don't tokenize for
> "_" or any given characters?

You need to add "_" to the #LETTER definition in StandardTokenizer.jj, then 
rebuild StandardTokenizer.java using the appropriate and task.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org