You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Patric Forsgard <pa...@tasteful.se> on 2011/04/05 15:28:34 UTC

[Lucene.Net] Index and search for a phrase including plus (+) sign.

Hi.

Is it possible to create index documents that contains fields with a phrase
including an plus sign, example:
Doc1: apa + kata
Doc2: apa+kata

I use the StandardAnalyzer both when I create the index document and when I
search. Have try to search with and without quotes around the phrase without
luck.

If I use the WhitespaceAnalyzer I can search for the phrase "apa+kata" and
get hit on Doc2 but if trying to search for "apa + kata" i will not get any
hits.

I try to find by google about some good answer but searching for the "+" in
google was not easy... Any suggestions if this should be possible and in
that case what analyzer should I use?

// Patric

Re: [Lucene.Net] Index and search for a phrase including plus (+) sign.

Posted by K a r n a v <ka...@gmail.com>.
I'm using the below method while indexing and while searching ( so for
example if i have a field like company description..for that im storing the
original description as it is in index and one more field with the same data
but replaced with special defined codes for the special chars like +, -
...like that... when showing the results to the user will used to show the
original one field which stored as is...while searching will use the
replaced version of the feild....)...
...i think its not the efficient one...as compared to wriiting ananlyzer i
thinik...but...its working for me while indexing and searching......if
anyone writes an analyzer to identify and replace special chars with
codes....please reply me baclk with the implementation..
thank you.

public static string
GetSearchTerm_By_Replacing_SpecialChars_With_DEFINEDCODE(string sKeyword,
bool bReplaceSPACEWithCode, bool bReplaceDBLQuoteWithCode, string
sCharsEscList)
        {
            sKeyword = sKeyword.Trim();

            if (bReplaceSPACEWithCode)
                sKeyword = sKeyword.Replace(" ", "WSCSSPACE"); //replace
space with code

            if (bReplaceDBLQuoteWithCode)
                sKeyword = sKeyword.Replace("\"", " WSCDQOTE "); //replace
double quote with code
            else
                sKeyword = sKeyword.Replace("\"", "\\\"");


            //sKeyword = sKeyword.Replace("‘","CLSQ");// ‘     left single
quote
            if(!sCharsEscList.Contains("!"))
                sKeyword = sKeyword.Replace("!", " WSCEXCLA ");// !
exclamation mark
            if (!sCharsEscList.Contains("#"))
                sKeyword = sKeyword.Replace("#", " WSCHASH ");// " double
quotqation mark
            if (!sCharsEscList.Contains("$"))
                sKeyword = sKeyword.Replace("$", " WSCUSD ");//
            if (!sCharsEscList.Contains("%"))
                sKeyword = sKeyword.Replace("%", " WSCPRCNT ");//
            if (!sCharsEscList.Contains("&"))
                sKeyword = sKeyword.Replace("&", " WSCAMPSND ");//
            if (!sCharsEscList.Contains("'"))
                sKeyword = sKeyword.Replace("'", " WSCSQ ");// single quote
            if (!sCharsEscList.Contains(":"))
                sKeyword = sKeyword.Replace(":", " WSCCOLN ");//
            if (!sCharsEscList.Contains(";"))
                sKeyword = sKeyword.Replace(";", " WSCSEMCOL ");//
            if (!sCharsEscList.Contains("?"))
                sKeyword = sKeyword.Replace("?", " WSCQSTN ");//
            if (!sCharsEscList.Contains("/"))
                sKeyword = sKeyword.Replace("/", " WSCFSLSH ");//
            if (!sCharsEscList.Contains("\\"))
                sKeyword = sKeyword.Replace("\\", " WSCBSLSH ");//
            if (!sCharsEscList.Contains(">"))
                sKeyword = sKeyword.Replace(">", " WSCGTN ");//
            if (!sCharsEscList.Contains("<"))
                sKeyword = sKeyword.Replace("<", " WSCLTN ");//
            if (!sCharsEscList.Contains("."))
                sKeyword = sKeyword.Replace(".", " WSCDOT ");//
            if (!sCharsEscList.Contains("~"))
                sKeyword = sKeyword.Replace("~", " WSCTILDA ");//
            if (!sCharsEscList.Contains("@"))
                sKeyword = sKeyword.Replace("@", " WSCATRATE ");//
            if (!sCharsEscList.Contains("^"))
                sKeyword = sKeyword.Replace("^", " WSCCAP ");//
            if (!sCharsEscList.Contains("*"))
                sKeyword = sKeyword.Replace("*", " WSCASTRK ");//
            if (!sCharsEscList.Contains("("))
                sKeyword = sKeyword.Replace("(", " WSCLFTP ");//
            if (!sCharsEscList.Contains(")"))
                sKeyword = sKeyword.Replace(")", " WSCRHTP ");//
            if (!sCharsEscList.Contains("["))
                sKeyword = sKeyword.Replace("[", " WSCLBRKT ");//
            if (!sCharsEscList.Contains("]"))
                sKeyword = sKeyword.Replace("]", " WSCRBRKT ");//
            if (!sCharsEscList.Contains("{"))
                sKeyword = sKeyword.Replace("{", " WSCLFLBRKT ");//
            if (!sCharsEscList.Contains("}"))
                sKeyword = sKeyword.Replace("}", " WSCRFLBRKT ");//
            if (!sCharsEscList.Contains("|"))
                sKeyword = sKeyword.Replace("|", " WSCPIPE ");//
            if (!sCharsEscList.Contains("="))
                sKeyword = sKeyword.Replace("=", " WSCEQUAL ");//
            if (!sCharsEscList.Contains("+"))
                sKeyword = sKeyword.Replace("+", " WSCPLUS ");//
            if (!sCharsEscList.Contains("-"))
                sKeyword = sKeyword.Replace("-", " WSCHIFN ");// minus or
hyphen
            if (!sCharsEscList.Contains("_"))
                sKeyword = sKeyword.Replace("_", " WSCUNDRSC ");// undercore
            if (!sCharsEscList.Contains(","))
                sKeyword = sKeyword.Replace(",", " WSCCOMMA ");// undercore



            return sKeyword;
        }


On Wed, Apr 6, 2011 at 12:43 AM, digy digy <di...@gmail.com> wrote:

> Working with "+" can be problematic and you may need to write your own
> custom analyzer.
> You can use code below to see what you are indexing and searching.
>
> DIGY
>
>
>       * string Test(string docTextToIndex, string textToSearch)*
> *        {*
> *
> *
> *            StringBuilder sb = new StringBuilder();*
> *
> *
> *            Analyzer analyzer = new SomeAnalyzer();*
> *
> *
> *            TokenStream ts = analyzer.TokenStream("", new
> System.IO.StringReader(docTextToIndex));*
> *
> *
> *            Lucene.Net.Analysis.Token t = ts.Next();*
> *            while (t != null)*
> *            {*
> *                sb.AppendLine("Token: " + t.TermText());*
> *                t = ts.Next();*
> *            }*
> *
> *
> *            sb.AppendLine("QUERY: " + new QueryParser("deffield",
> analyzer).Parse(textToSearch).ToString() );*
> *
> *
> *            return sb.ToString();*
> *        }*
>
>
> On Tue, Apr 5, 2011 at 4:28 PM, Patric Forsgard <pa...@tasteful.se>
> wrote:
>
> > Hi.
> >
> > Is it possible to create index documents that contains fields with a
> phrase
> > including an plus sign, example:
> > Doc1: apa + kata
> > Doc2: apa+kata
> >
> > I use the StandardAnalyzer both when I create the index document and when
> I
> > search. Have try to search with and without quotes around the phrase
> > without
> > luck.
> >
> > If I use the WhitespaceAnalyzer I can search for the phrase "apa+kata"
> and
> > get hit on Doc2 but if trying to search for "apa + kata" i will not get
> any
> > hits.
> >
> > I try to find by google about some good answer but searching for the "+"
> in
> > google was not easy... Any suggestions if this should be possible and in
> > that case what analyzer should I use?
> >
> > // Patric
> >
>



-- 
*Thanks & Regards*,
*Karunaker Reddy V

*http://www.flickr.com/photos/karnav/

*Ooh!!*, and one more thing: *no matter who you are, you were built to be
brilliant and designed to make a difference in this world*.* PLEASE DOT IT*!

Re: [Lucene.Net] Index and search for a phrase including plus (+) sign.

Posted by Patric Forsgard <pa...@tasteful.se>.
Thank you for the answer, i will try to find out if an how i should use the
analyzer or if we need to build an own.

// Patric


On 5 April 2011 21:13, digy digy <di...@gmail.com> wrote:

> Working with "+" can be problematic and you may need to write your own
> custom analyzer.
> You can use code below to see what you are indexing and searching.
>
> DIGY
>
>
>       * string Test(string docTextToIndex, string textToSearch)*
> *        {*
> *
> *
> *            StringBuilder sb = new StringBuilder();*
> *
> *
> *            Analyzer analyzer = new SomeAnalyzer();*
> *
> *
> *            TokenStream ts = analyzer.TokenStream("", new
> System.IO.StringReader(docTextToIndex));*
> *
> *
> *            Lucene.Net.Analysis.Token t = ts.Next();*
> *            while (t != null)*
> *            {*
> *                sb.AppendLine("Token: " + t.TermText());*
> *                t = ts.Next();*
> *            }*
> *
> *
> *            sb.AppendLine("QUERY: " + new QueryParser("deffield",
> analyzer).Parse(textToSearch).ToString() );*
> *
> *
> *            return sb.ToString();*
> *        }*
>
>
> On Tue, Apr 5, 2011 at 4:28 PM, Patric Forsgard <pa...@tasteful.se>
> wrote:
>
> > Hi.
> >
> > Is it possible to create index documents that contains fields with a
> phrase
> > including an plus sign, example:
> > Doc1: apa + kata
> > Doc2: apa+kata
> >
> > I use the StandardAnalyzer both when I create the index document and when
> I
> > search. Have try to search with and without quotes around the phrase
> > without
> > luck.
> >
> > If I use the WhitespaceAnalyzer I can search for the phrase "apa+kata"
> and
> > get hit on Doc2 but if trying to search for "apa + kata" i will not get
> any
> > hits.
> >
> > I try to find by google about some good answer but searching for the "+"
> in
> > google was not easy... Any suggestions if this should be possible and in
> > that case what analyzer should I use?
> >
> > // Patric
> >
>

Re: [Lucene.Net] Index and search for a phrase including plus (+) sign.

Posted by digy digy <di...@gmail.com>.
Working with "+" can be problematic and you may need to write your own
custom analyzer.
You can use code below to see what you are indexing and searching.

DIGY


       * string Test(string docTextToIndex, string textToSearch)*
*        {*
*
*
*            StringBuilder sb = new StringBuilder();*
*
*
*            Analyzer analyzer = new SomeAnalyzer();*
*
*
*            TokenStream ts = analyzer.TokenStream("", new
System.IO.StringReader(docTextToIndex));*
*
*
*            Lucene.Net.Analysis.Token t = ts.Next();*
*            while (t != null)*
*            {*
*                sb.AppendLine("Token: " + t.TermText());*
*                t = ts.Next();*
*            }*
*
*
*            sb.AppendLine("QUERY: " + new QueryParser("deffield",
analyzer).Parse(textToSearch).ToString() );*
*
*
*            return sb.ToString();*
*        }*


On Tue, Apr 5, 2011 at 4:28 PM, Patric Forsgard <pa...@tasteful.se> wrote:

> Hi.
>
> Is it possible to create index documents that contains fields with a phrase
> including an plus sign, example:
> Doc1: apa + kata
> Doc2: apa+kata
>
> I use the StandardAnalyzer both when I create the index document and when I
> search. Have try to search with and without quotes around the phrase
> without
> luck.
>
> If I use the WhitespaceAnalyzer I can search for the phrase "apa+kata" and
> get hit on Doc2 but if trying to search for "apa + kata" i will not get any
> hits.
>
> I try to find by google about some good answer but searching for the "+" in
> google was not easy... Any suggestions if this should be possible and in
> that case what analyzer should I use?
>
> // Patric
>