You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by K a r n a <ka...@gmail.com> on 2010/11/11 07:11:35 UTC

How Can I Deal with Special Chars like #, /,

How can I search when my search term contains #, /, &. $, @,
',(,),{,},[,],|,\-,+,=,*,!,~,`....etc

for example If I want to search C#, Asp.net, Accounting/ Audinting,
Banking & Financial...
How I can prepare a search query for the above keywords
...
Please let me know if anyone knows the solution...I'm doing trial and
error from past 1 month....
still I'm unable to find the solution.


-- 
Thanks & Regards,
Karunaker Reddy V

RE: How Can I Deal with Special Chars like #, /,

Posted by Digy <di...@gmail.com>.
>> is there any tutorial that differentiates between analyzers???
You can visit Lucene Java site for more info about analyzers.

>> if i index these kind of character, they will
>> somewhat behave like closed class words or even worse in some cases (with
>> very high frequencies)... will it be wise/ok to index them given that you
>> have more than 100 million of documents (say i use whitespaceanalyzer)
It depends. But generally, less UNIQUE terms ==> smaller index ==> better
performance.

DIGY

-----Original Message-----
From: Umer Khalid Qureshi [mailto:omer.khalid@gmail.com] 
Sent: Friday, November 12, 2010 7:12 AM
To: lucene-net-user@lucene.apache.org
Subject: Re: How Can I Deal with Special Chars like #, /,

that is quite guiding...

is there any tutorial that differentiates between analyzers???
Also, one more question...if i index these kind of character, they will
somewhat behave like closed class words or even worse in some cases (with
very high frequencies)... will it be wise/ok to index them given that you
have more than 100 million of documents (say i use whitespaceanalyzer).



On Thu, Nov 11, 2010 at 7:44 PM, digy digy <di...@gmail.com> wrote:

> It seems that the problem is not related with parsing or escaping the
> string, rather choosing an inappropriate analyzer for your needs. You can
> not search what you haven't indexed.
>
> You can use  below code to see what is indexed with different type of
> analyzers.
>
> DIGY
>
> Analyzer analyzer = new ......Analyzer();
> TokenStream stream = analyzer.TokenStream("", new
> System.IO.StringReader("your text to be indexed"));
> Token token = stream.Next();
> while (  token !=null )
> {
> Console.WriteLine(token.TermText());
> token = stream.Next();
> }
>
> On Thu, Nov 11, 2010 at 1:58 PM, Umer Khalid Qureshi
> <om...@gmail.com>wrote:
>
> > Well, I am facing the same problem
> > I am though using standardanalyzer as following:
> >
> > analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(new
string[]
> {
> > "" });
> >
> > and the search query i am doing is like  *"ri*k fac*r"~3*
> > but when i parse it through query parser,
> >
> > QueryParser queryParser = new
> > QueryParser(Lucene.Net.Util.Version.LUCENE_29,"Contents", analyzer);
> > queryParser.Parse(*"\"ri*k fac*r\"~3"*);
> >
> > it replaced '*' with empty space and results as *"ri k fac r"~3*
> >
> > I tried following too
> > queryParser.Parse(*"\"ri\\*k fac\\*r\"~3"*);
> > but of no use.
> >
> > Can you guide us how to escape them ?
> >
> > P.S: when i use Whitespaceanalyzer, the parsing become just fine and
> > results
> > as i expect. but i can't use whiteSpaceAnalyzer.
> >
> >
> >
> >
> >
> >
> >
> > 2010/11/11 Pál Barnabás <pb...@gmail.com>
> >
> > > Hi,
> > > Check the 'Escaping Special Characters' section in the query parser
> > > document:
> > > http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
> > >
> > > 2010/11/11 K a r n a <ka...@gmail.com>:
> > > > How can I search when my search term contains #, /, &. $, @,
> > > > ',(,),{,},[,],|,\-,+,=,*,!,~,`....etc
> > > >
> > > > for example If I want to search C#, Asp.net, Accounting/ Audinting,
> > > > Banking & Financial...
> > > > How I can prepare a search query for the above keywords
> > > > ...
> > > > Please let me know if anyone knows the solution...I'm doing trial
and
> > > > error from past 1 month....
> > > > still I'm unable to find the solution.
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Karunaker Reddy V
> > > >
> > >
> >
>


Re: How Can I Deal with Special Chars like #, /,

Posted by Umer Khalid Qureshi <om...@gmail.com>.
that is quite guiding...

is there any tutorial that differentiates between analyzers???
Also, one more question...if i index these kind of character, they will
somewhat behave like closed class words or even worse in some cases (with
very high frequencies)... will it be wise/ok to index them given that you
have more than 100 million of documents (say i use whitespaceanalyzer).



On Thu, Nov 11, 2010 at 7:44 PM, digy digy <di...@gmail.com> wrote:

> It seems that the problem is not related with parsing or escaping the
> string, rather choosing an inappropriate analyzer for your needs. You can
> not search what you haven't indexed.
>
> You can use  below code to see what is indexed with different type of
> analyzers.
>
> DIGY
>
> Analyzer analyzer = new ......Analyzer();
> TokenStream stream = analyzer.TokenStream("", new
> System.IO.StringReader("your text to be indexed"));
> Token token = stream.Next();
> while (  token !=null )
> {
> Console.WriteLine(token.TermText());
> token = stream.Next();
> }
>
> On Thu, Nov 11, 2010 at 1:58 PM, Umer Khalid Qureshi
> <om...@gmail.com>wrote:
>
> > Well, I am facing the same problem
> > I am though using standardanalyzer as following:
> >
> > analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(new string[]
> {
> > "" });
> >
> > and the search query i am doing is like  *"ri*k fac*r"~3*
> > but when i parse it through query parser,
> >
> > QueryParser queryParser = new
> > QueryParser(Lucene.Net.Util.Version.LUCENE_29,"Contents", analyzer);
> > queryParser.Parse(*"\"ri*k fac*r\"~3"*);
> >
> > it replaced '*' with empty space and results as *"ri k fac r"~3*
> >
> > I tried following too
> > queryParser.Parse(*"\"ri\\*k fac\\*r\"~3"*);
> > but of no use.
> >
> > Can you guide us how to escape them ?
> >
> > P.S: when i use Whitespaceanalyzer, the parsing become just fine and
> > results
> > as i expect. but i can't use whiteSpaceAnalyzer.
> >
> >
> >
> >
> >
> >
> >
> > 2010/11/11 Pál Barnabás <pb...@gmail.com>
> >
> > > Hi,
> > > Check the 'Escaping Special Characters' section in the query parser
> > > document:
> > > http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
> > >
> > > 2010/11/11 K a r n a <ka...@gmail.com>:
> > > > How can I search when my search term contains #, /, &. $, @,
> > > > ',(,),{,},[,],|,\-,+,=,*,!,~,`....etc
> > > >
> > > > for example If I want to search C#, Asp.net, Accounting/ Audinting,
> > > > Banking & Financial...
> > > > How I can prepare a search query for the above keywords
> > > > ...
> > > > Please let me know if anyone knows the solution...I'm doing trial and
> > > > error from past 1 month....
> > > > still I'm unable to find the solution.
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Karunaker Reddy V
> > > >
> > >
> >
>

Re: How Can I Deal with Special Chars like #, /,

Posted by digy digy <di...@gmail.com>.
It seems that the problem is not related with parsing or escaping the
string, rather choosing an inappropriate analyzer for your needs. You can
not search what you haven't indexed.

You can use  below code to see what is indexed with different type of
analyzers.

DIGY

Analyzer analyzer = new ......Analyzer();
TokenStream stream = analyzer.TokenStream("", new
System.IO.StringReader("your text to be indexed"));
Token token = stream.Next();
while (  token !=null )
{
Console.WriteLine(token.TermText());
token = stream.Next();
}

On Thu, Nov 11, 2010 at 1:58 PM, Umer Khalid Qureshi
<om...@gmail.com>wrote:

> Well, I am facing the same problem
> I am though using standardanalyzer as following:
>
> analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(new string[] {
> "" });
>
> and the search query i am doing is like  *"ri*k fac*r"~3*
> but when i parse it through query parser,
>
> QueryParser queryParser = new
> QueryParser(Lucene.Net.Util.Version.LUCENE_29,"Contents", analyzer);
> queryParser.Parse(*"\"ri*k fac*r\"~3"*);
>
> it replaced '*' with empty space and results as *"ri k fac r"~3*
>
> I tried following too
> queryParser.Parse(*"\"ri\\*k fac\\*r\"~3"*);
> but of no use.
>
> Can you guide us how to escape them ?
>
> P.S: when i use Whitespaceanalyzer, the parsing become just fine and
> results
> as i expect. but i can't use whiteSpaceAnalyzer.
>
>
>
>
>
>
>
> 2010/11/11 Pál Barnabás <pb...@gmail.com>
>
> > Hi,
> > Check the 'Escaping Special Characters' section in the query parser
> > document:
> > http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
> >
> > 2010/11/11 K a r n a <ka...@gmail.com>:
> > > How can I search when my search term contains #, /, &. $, @,
> > > ',(,),{,},[,],|,\-,+,=,*,!,~,`....etc
> > >
> > > for example If I want to search C#, Asp.net, Accounting/ Audinting,
> > > Banking & Financial...
> > > How I can prepare a search query for the above keywords
> > > ...
> > > Please let me know if anyone knows the solution...I'm doing trial and
> > > error from past 1 month....
> > > still I'm unable to find the solution.
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Karunaker Reddy V
> > >
> >
>

Re: How Can I Deal with Special Chars like #, /,

Posted by Wyatt Barnett <wy...@gmail.com>.
The method you seek is QueryParser.Escape():

QueryParser queryParser = new
QueryParser(Lucene.Net.Util.Version.LUCENE_29,"Contents", analyzer);
queryParser.Parse(QueryParser.Escape("\"ri*k fac*r\"~3"));

On Thu, Nov 11, 2010 at 6:58 AM, Umer Khalid Qureshi
<om...@gmail.com> wrote:
> Well, I am facing the same problem
> I am though using standardanalyzer as following:
>
> analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(new string[] {
> "" });
>
> and the search query i am doing is like  *"ri*k fac*r"~3*
> but when i parse it through query parser,
>
> QueryParser queryParser = new
> QueryParser(Lucene.Net.Util.Version.LUCENE_29,"Contents", analyzer);
> queryParser.Parse(*"\"ri*k fac*r\"~3"*);
>
> it replaced '*' with empty space and results as *"ri k fac r"~3*
>
> I tried following too
> queryParser.Parse(*"\"ri\\*k fac\\*r\"~3"*);
> but of no use.
>
> Can you guide us how to escape them ?
>
> P.S: when i use Whitespaceanalyzer, the parsing become just fine and results
> as i expect. but i can't use whiteSpaceAnalyzer.
>
>
>
>
>
>
>
> 2010/11/11 Pál Barnabás <pb...@gmail.com>
>
>> Hi,
>> Check the 'Escaping Special Characters' section in the query parser
>> document:
>> http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
>>
>> 2010/11/11 K a r n a <ka...@gmail.com>:
>> > How can I search when my search term contains #, /, &. $, @,
>> > ',(,),{,},[,],|,\-,+,=,*,!,~,`....etc
>> >
>> > for example If I want to search C#, Asp.net, Accounting/ Audinting,
>> > Banking & Financial...
>> > How I can prepare a search query for the above keywords
>> > ...
>> > Please let me know if anyone knows the solution...I'm doing trial and
>> > error from past 1 month....
>> > still I'm unable to find the solution.
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Karunaker Reddy V
>> >
>>
>

Re: How Can I Deal with Special Chars like #, /,

Posted by Umer Khalid Qureshi <om...@gmail.com>.
Well, I am facing the same problem
I am though using standardanalyzer as following:

analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(new string[] {
"" });

and the search query i am doing is like  *"ri*k fac*r"~3*
but when i parse it through query parser,

QueryParser queryParser = new
QueryParser(Lucene.Net.Util.Version.LUCENE_29,"Contents", analyzer);
queryParser.Parse(*"\"ri*k fac*r\"~3"*);

it replaced '*' with empty space and results as *"ri k fac r"~3*

I tried following too
queryParser.Parse(*"\"ri\\*k fac\\*r\"~3"*);
but of no use.

Can you guide us how to escape them ?

P.S: when i use Whitespaceanalyzer, the parsing become just fine and results
as i expect. but i can't use whiteSpaceAnalyzer.







2010/11/11 Pál Barnabás <pb...@gmail.com>

> Hi,
> Check the 'Escaping Special Characters' section in the query parser
> document:
> http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
>
> 2010/11/11 K a r n a <ka...@gmail.com>:
> > How can I search when my search term contains #, /, &. $, @,
> > ',(,),{,},[,],|,\-,+,=,*,!,~,`....etc
> >
> > for example If I want to search C#, Asp.net, Accounting/ Audinting,
> > Banking & Financial...
> > How I can prepare a search query for the above keywords
> > ...
> > Please let me know if anyone knows the solution...I'm doing trial and
> > error from past 1 month....
> > still I'm unable to find the solution.
> >
> >
> > --
> > Thanks & Regards,
> > Karunaker Reddy V
> >
>

Re: How Can I Deal with Special Chars like #, /,

Posted by Pál Barnabás <pb...@gmail.com>.
Hi,
Check the 'Escaping Special Characters' section in the query parser document:
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html

2010/11/11 K a r n a <ka...@gmail.com>:
> How can I search when my search term contains #, /, &. $, @,
> ',(,),{,},[,],|,\-,+,=,*,!,~,`....etc
>
> for example If I want to search C#, Asp.net, Accounting/ Audinting,
> Banking & Financial...
> How I can prepare a search query for the above keywords
> ...
> Please let me know if anyone knows the solution...I'm doing trial and
> error from past 1 month....
> still I'm unable to find the solution.
>
>
> --
> Thanks & Regards,
> Karunaker Reddy V
>