You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by sy...@hotmail.com on 2022/11/09 11:31:35 UTC

Including commas in a search

Hi

My users want to be able to include commas in their searches. So, for example they have a string "John, Steve, Mary and Jane" there want to be able to search "John, Steve", they have entity names that will have multiple person names so they could, including the above have say

"John, Steve, Brad and Terry"
"John, Steve, Brad and Catherine"

Note those are the full names of the entity to be searched.

If the user searched for a comma it finds nothing. If however my entity names were

"John# Steve# Brad and Terry"
"John#, Steve# Brad and Catherine"

They can use # for the search term and both the above will be found.

I used the QueryParserUtil.Escape function on my search term so that sorts out mose non alpha numberics but not, it seems, the commas. If I try replacing the comma with "\," then it also returns nothing, leading me to believe the commas are removed when indexing.

I am using a custom analyzer which is using WhiteSpaceTokenizer with LowerCaseFilter. So basically the standard Analyzer but using Whitespace Tokenizer as we want to split on whitespace but I don't think that is where the issue stems from but include the code in case it helps.

protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
        {
            var src = new WhitespaceTokenizer(m_matchVersion, reader);

            TokenStream tok = new StandardFilter(m_matchVersion, src);

            tok = new LowerCaseFilter(m_matchVersion, tok);
            tok = new StopFilter(m_matchVersion, tok, m_stopwords);

            return new TokenStreamComponentsAnonymousClass(src, tok);
        }

The questions are


  1.  Is it possible to search on a comma?
  2.  If it is, at what stage do I need to make a change, at the indexing stage or in the search, perhaps I need to replace commas with something before the term gets sent to be searched upon. I did try escaping them with backslashes but didn't work.

Many thanks

Paul

Re: Including commas in a search - ignore previous

Posted by sy...@hotmail.com.
Please ignore previous email, I just spotted something and the commas are able to be included in the search its just how we have implemented wildcards which i am not sure makes sense in retrospect.

Thanks,
Paul
________________________________
From: syvretp@hotmail.com <sy...@hotmail.com>
Sent: Wednesday, November 9, 2022 11:31 AM
To: user@lucenenet.apache.org <us...@lucenenet.apache.org>
Subject: Including commas in a search

Hi

My users want to be able to include commas in their searches. So, for example they have a string "John, Steve, Mary and Jane" there want to be able to search "John, Steve", they have entity names that will have multiple person names so they could, including the above have say

"John, Steve, Brad and Terry"
"John, Steve, Brad and Catherine"

Note those are the full names of the entity to be searched.

If the user searched for a comma it finds nothing. If however my entity names were

"John# Steve# Brad and Terry"
"John#, Steve# Brad and Catherine"

They can use # for the search term and both the above will be found.

I used the QueryParserUtil.Escape function on my search term so that sorts out mose non alpha numberics but not, it seems, the commas. If I try replacing the comma with "\," then it also returns nothing, leading me to believe the commas are removed when indexing.

I am using a custom analyzer which is using WhiteSpaceTokenizer with LowerCaseFilter. So basically the standard Analyzer but using Whitespace Tokenizer as we want to split on whitespace but I don't think that is where the issue stems from but include the code in case it helps.

protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
        {
            var src = new WhitespaceTokenizer(m_matchVersion, reader);

            TokenStream tok = new StandardFilter(m_matchVersion, src);

            tok = new LowerCaseFilter(m_matchVersion, tok);
            tok = new StopFilter(m_matchVersion, tok, m_stopwords);

            return new TokenStreamComponentsAnonymousClass(src, tok);
        }

The questions are


  1.  Is it possible to search on a comma?
  2.  If it is, at what stage do I need to make a change, at the indexing stage or in the search, perhaps I need to replace commas with something before the term gets sent to be searched upon. I did try escaping them with backslashes but didn't work.

Many thanks

Paul