You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Sieretzki, Dionne R, SOLGV" <dh...@att.com> on 2003/03/13 16:15:03 UTC

Searching for hyphenated terms

I have seen some previous postings about "Escape woes" and "Hyphens not matching", but I haven't seen any resolutions to an issue I've been trying to work out.  

I don't want my search field to be case sensitive, so I used StandardAnalyzer.  The search field also has corresponding entries that may or may not contain hyphens or other special characters. If the field is not tokenized, very few search terms result in matches.  It appears that terms are only matched if a wildcard is used, such as:

Entered: ADOG  / Actual Query is: adog / No match on an exact term
Entered: ADOG* / Actual Query is: ADOG* /  Match found
Entered: AAA-ADOG / Actual Query is: aaa -adog / No match 
Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / No match 
Entered: AAA?ADOG /  Actual Query is: aaa?adog / Match found
Entered: DOG.2  / Actual Query is: dog.2 / No match 
Entered: DOG?2 / Actual Query is: DOG?2 /  Match found


If the field is tokenized, then even more mixed results are produced.

Entered: ADOG / Actual Query is: adog / Match found for exact term
Entered: ADOG* / Acutal Query is: ADOG* / No match
Entered: AAA-ADOG / Actual Query is: aaa -adog / Match found
Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / Match found
Entered: DOG.2 / Actual Query is: adog.2  / Match found
Entered: AAA-DOG-BBB / Actual Query is: aaa -dog -bbb / No match
Entered: " AAA-DOG-BBB" / Actual Query is: "aaa dog bbb" / No match
Entered: ADOG-I40 / Actual Query is: adog -i40 / Incorrect matches
Entered: "ADOG-I40" / Actual Query is: adog-i40 / Match found for exact term


Can anyone recommend the right Analyzer to use that isn't case sensitive and matches on both hyphenated and non-hyphenated terms?


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Searching for hyphenated terms

Posted by Rob Outar <ro...@ideorlando.org>.
I had similar problems that were solved with this Analyzer:

 
    public TokenStream tokenStream(String field, final Reader reader) {

        // do not tokenize any field
        TokenStream t = new CharTokenizer(reader) {
            protected boolean isTokenChar(char c) {
                return true;
            }
        };

        //case insensitive search
        t = new LowerCaseFilter(t);
        return t;

    }

Thanks,
 
Rob 


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Thursday, March 13, 2003 11:22 AM
To: Lucene Users List
Subject: Re: Searching for hyphenated terms


Make a custom Analyzer.  They are super simple to write.
Take pieces of WhitespaceAnalyzer and the Standard one.

Otis

--- "Sieretzki, Dionne R, SOLGV" <dh...@att.com> wrote:
> I have seen some previous postings about "Escape woes" and "Hyphens
> not matching", but I haven't seen any resolutions to an issue I've
> been trying to work out.  
> 
> I don't want my search field to be case sensitive, so I used
> StandardAnalyzer.  The search field also has corresponding entries
> that may or may not contain hyphens or other special characters. If
> the field is not tokenized, very few search terms result in matches. 
> It appears that terms are only matched if a wildcard is used, such
> as:
> 
> Entered: ADOG  / Actual Query is: adog / No match on an exact term
> Entered: ADOG* / Actual Query is: ADOG* /  Match found
> Entered: AAA-ADOG / Actual Query is: aaa -adog / No match 
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / No match 
> Entered: AAA?ADOG /  Actual Query is: aaa?adog / Match found
> Entered: DOG.2  / Actual Query is: dog.2 / No match 
> Entered: DOG?2 / Actual Query is: DOG?2 /  Match found
> 
> 
> If the field is tokenized, then even more mixed results are produced.
> 
> Entered: ADOG / Actual Query is: adog / Match found for exact term
> Entered: ADOG* / Acutal Query is: ADOG* / No match
> Entered: AAA-ADOG / Actual Query is: aaa -adog / Match found
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / Match found
> Entered: DOG.2 / Actual Query is: adog.2  / Match found
> Entered: AAA-DOG-BBB / Actual Query is: aaa -dog -bbb / No match
> Entered: " AAA-DOG-BBB" / Actual Query is: "aaa dog bbb" / No match
> Entered: ADOG-I40 / Actual Query is: adog -i40 / Incorrect matches
> Entered: "ADOG-I40" / Actual Query is: adog-i40 / Match found for
> exact term
> 
> 
> Can anyone recommend the right Analyzer to use that isn't case
> sensitive and matches on both hyphenated and non-hyphenated terms?
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Searching for hyphenated terms

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Make a custom Analyzer.  They are super simple to write.
Take pieces of WhitespaceAnalyzer and the Standard one.

Otis

--- "Sieretzki, Dionne R, SOLGV" <dh...@att.com> wrote:
> I have seen some previous postings about "Escape woes" and "Hyphens
> not matching", but I haven't seen any resolutions to an issue I've
> been trying to work out.  
> 
> I don't want my search field to be case sensitive, so I used
> StandardAnalyzer.  The search field also has corresponding entries
> that may or may not contain hyphens or other special characters. If
> the field is not tokenized, very few search terms result in matches. 
> It appears that terms are only matched if a wildcard is used, such
> as:
> 
> Entered: ADOG  / Actual Query is: adog / No match on an exact term
> Entered: ADOG* / Actual Query is: ADOG* /  Match found
> Entered: AAA-ADOG / Actual Query is: aaa -adog / No match 
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / No match 
> Entered: AAA?ADOG /  Actual Query is: aaa?adog / Match found
> Entered: DOG.2  / Actual Query is: dog.2 / No match 
> Entered: DOG?2 / Actual Query is: DOG?2 /  Match found
> 
> 
> If the field is tokenized, then even more mixed results are produced.
> 
> Entered: ADOG / Actual Query is: adog / Match found for exact term
> Entered: ADOG* / Acutal Query is: ADOG* / No match
> Entered: AAA-ADOG / Actual Query is: aaa -adog / Match found
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / Match found
> Entered: DOG.2 / Actual Query is: adog.2  / Match found
> Entered: AAA-DOG-BBB / Actual Query is: aaa -dog -bbb / No match
> Entered: " AAA-DOG-BBB" / Actual Query is: "aaa dog bbb" / No match
> Entered: ADOG-I40 / Actual Query is: adog -i40 / Incorrect matches
> Entered: "ADOG-I40" / Actual Query is: adog-i40 / Match found for
> exact term
> 
> 
> Can anyone recommend the right Analyzer to use that isn't case
> sensitive and matches on both hyphenated and non-hyphenated terms?
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org