You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by bhaskar chandrasekar <ba...@yahoo.co.in> on 2009/08/26 19:28:45 UTC

Pattern matching in Solr

Hi,
 
Can any one help me with the below scenario?.
 
Scenario 1:
 
Assume that I give Google as input string 
i am using Carrot with Solr 
Carrot is for front end display purpose 
the issue is 
Assuming i give "BHASKAR" as input string 
It should give me search results pertaining to BHASKAR only.
 Select * from MASTER where name ="Bhaskar";
 Example:It should not display search results as "ChandarBhaskar" or
 "BhaskarC".
 Should display Bhaskar only.
 
Scenario 2:
 Select * from MASTER where name like "%BHASKAR%";
 It should display records containing the word BHASKAR
 Ex: Bhaskar
ChandarBhaskar
 BhaskarC
 Bhaskarabc

 How to achieve Scenario 1 in Solr ?.


 
Regards
Bhaskar



      

Re: Pattern matching in Solr

Posted by bhaskar chandrasekar <ba...@yahoo.co.in>.
Hi,
 
In Schema.xml file,I am not able ot find splitOnCaseChange="1".
I am not looking for case sensitive search.
Let me know what file you are refering to?.
I am looking for exact match search only

Moreover for scenario 2 the KeywordTokenizerFactory
and EdgeNGramFilterFactory refers which link in Solr wiki.
 
Regards
Bhaskar



--- On Thu, 8/27/09, Avlesh Singh <av...@gmail.com> wrote:


From: Avlesh Singh <av...@gmail.com>
Subject: Re: Pattern matching in Solr
To: solr-user@lucene.apache.org
Date: Thursday, August 27, 2009, 2:10 AM


>
> In Schema.xml file,I am not able ot find splitOnCaseChange="1".
>
Unless you have modified the stock field type definition of "text" field in
your core's schema.xml you should be able to find this property set for the
WordDelimiterFilterFactory. Read more here -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089

Moreover for scenario 2 the KeywordTokenizerFactory and
> EdgeNGramFilterFactory refers which link in Solr wiki.
>
Google for these two.

Cheers
Avlesh

On Thu, Aug 27, 2009 at 12:21 PM, bhaskar chandrasekar <bas_sree@yahoo.co.in
> wrote:

>
> Hi,
>
> In Schema.xml file,I am not able ot find splitOnCaseChange="1".
> I am not looking for case sensitive search.
> Let me know what file you are refering to?.
> I am looking for exact match search only
>
> Moreover for scenario 2 the KeywordTokenizerFactory
> and EdgeNGramFilterFactory refers which link in Solr wiki.
>
> Regards
> Bhaskar
>
> --- On Wed, 8/26/09, Avlesh Singh <av...@gmail.com> wrote:
>
>
> From: Avlesh Singh <av...@gmail.com>
> Subject: Re: Pattern matching in Solr
> To: solr-user@lucene.apache.org
> Date: Wednesday, August 26, 2009, 11:31 AM
>
>
> You could have used your previous thread itself (
>
> http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr
> ),
> Bhaskar.
>
> In your scenario one, you need an exact token match, right? You are getting
> expected results if your field type is "text". Look for the
> "WordDelimiterFilterFactory" in your field type definition for the text
> field inside schema.xml. You'll find an attribute splitOnCaseChange="1".
> Because of this, "ChandarBhaskar" is converted into two tokens "Chandra"
> and
> "Bhaskar" and hence the matches. You may choose to remove this attribute if
> the behaviour is not desired.
>
> For your scenario two, you may want to look at the KeywordTokenizerFactory
> and EdgeNGramFilterFactory on Solr wiki.
>
> Generally, for all such use cases people create multiple fields in their
> schema storing the same data analyzed in different ways.
>
> Cheers
> Avlesh
>
> On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar <
> bas_sree@yahoo.co.in
> > wrote:
>
> > Hi,
> >
> > Can any one help me with the below scenario?.
> >
> > Scenario 1:
> >
> > Assume that I give Google as input string
> > i am using Carrot with Solr
> > Carrot is for front end display purpose
> > the issue is
> > Assuming i give "BHASKAR" as input string
> > It should give me search results pertaining to BHASKAR only.
> >  Select * from MASTER where name ="Bhaskar";
> >  Example:It should not display search results as "ChandarBhaskar" or
> >  "BhaskarC".
> >  Should display Bhaskar only.
> >
> > Scenario 2:
> >  Select * from MASTER where name like "%BHASKAR%";
> >  It should display records containing the word BHASKAR
> >  Ex: Bhaskar
> > ChandarBhaskar
> >  BhaskarC
> >  Bhaskarabc
> >
> >  How to achieve Scenario 1 in Solr ?.
> >
> >
> >
> > Regards
> > Bhaskar
> >
> >
> >
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>



      

Re: Pattern matching in Solr

Posted by Avlesh Singh <av...@gmail.com>.
>
> In Schema.xml file,I am not able ot find splitOnCaseChange="1".
>
Unless you have modified the stock field type definition of "text" field in
your core's schema.xml you should be able to find this property set for the
WordDelimiterFilterFactory. Read more here -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089

Moreover for scenario 2 the KeywordTokenizerFactory and
> EdgeNGramFilterFactory refers which link in Solr wiki.
>
Google for these two.

Cheers
Avlesh

On Thu, Aug 27, 2009 at 12:21 PM, bhaskar chandrasekar <bas_sree@yahoo.co.in
> wrote:

>
> Hi,
>
> In Schema.xml file,I am not able ot find splitOnCaseChange="1".
> I am not looking for case sensitive search.
> Let me know what file you are refering to?.
> I am looking for exact match search only
>
> Moreover for scenario 2 the KeywordTokenizerFactory
> and EdgeNGramFilterFactory refers which link in Solr wiki.
>
> Regards
> Bhaskar
>
> --- On Wed, 8/26/09, Avlesh Singh <av...@gmail.com> wrote:
>
>
> From: Avlesh Singh <av...@gmail.com>
> Subject: Re: Pattern matching in Solr
> To: solr-user@lucene.apache.org
> Date: Wednesday, August 26, 2009, 11:31 AM
>
>
> You could have used your previous thread itself (
>
> http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr
> ),
> Bhaskar.
>
> In your scenario one, you need an exact token match, right? You are getting
> expected results if your field type is "text". Look for the
> "WordDelimiterFilterFactory" in your field type definition for the text
> field inside schema.xml. You'll find an attribute splitOnCaseChange="1".
> Because of this, "ChandarBhaskar" is converted into two tokens "Chandra"
> and
> "Bhaskar" and hence the matches. You may choose to remove this attribute if
> the behaviour is not desired.
>
> For your scenario two, you may want to look at the KeywordTokenizerFactory
> and EdgeNGramFilterFactory on Solr wiki.
>
> Generally, for all such use cases people create multiple fields in their
> schema storing the same data analyzed in different ways.
>
> Cheers
> Avlesh
>
> On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar <
> bas_sree@yahoo.co.in
> > wrote:
>
> > Hi,
> >
> > Can any one help me with the below scenario?.
> >
> > Scenario 1:
> >
> > Assume that I give Google as input string
> > i am using Carrot with Solr
> > Carrot is for front end display purpose
> > the issue is
> > Assuming i give "BHASKAR" as input string
> > It should give me search results pertaining to BHASKAR only.
> >  Select * from MASTER where name ="Bhaskar";
> >  Example:It should not display search results as "ChandarBhaskar" or
> >  "BhaskarC".
> >  Should display Bhaskar only.
> >
> > Scenario 2:
> >  Select * from MASTER where name like "%BHASKAR%";
> >  It should display records containing the word BHASKAR
> >  Ex: Bhaskar
> > ChandarBhaskar
> >  BhaskarC
> >  Bhaskarabc
> >
> >  How to achieve Scenario 1 in Solr ?.
> >
> >
> >
> > Regards
> > Bhaskar
> >
> >
> >
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>

Re: Pattern matching in Solr

Posted by bhaskar chandrasekar <ba...@yahoo.co.in>.
 
Hi,
 
In Schema.xml file,I am not able ot find splitOnCaseChange="1".
I am not looking for case sensitive search.
Let me know what file you are refering to?.
I am looking for exact match search only

Moreover for scenario 2 the KeywordTokenizerFactory
and EdgeNGramFilterFactory refers which link in Solr wiki.
 
Regards
Bhaskar

--- On Wed, 8/26/09, Avlesh Singh <av...@gmail.com> wrote:


From: Avlesh Singh <av...@gmail.com>
Subject: Re: Pattern matching in Solr
To: solr-user@lucene.apache.org
Date: Wednesday, August 26, 2009, 11:31 AM


You could have used your previous thread itself (
http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr),
Bhaskar.

In your scenario one, you need an exact token match, right? You are getting
expected results if your field type is "text". Look for the
"WordDelimiterFilterFactory" in your field type definition for the text
field inside schema.xml. You'll find an attribute splitOnCaseChange="1".
Because of this, "ChandarBhaskar" is converted into two tokens "Chandra" and
"Bhaskar" and hence the matches. You may choose to remove this attribute if
the behaviour is not desired.

For your scenario two, you may want to look at the KeywordTokenizerFactory
and EdgeNGramFilterFactory on Solr wiki.

Generally, for all such use cases people create multiple fields in their
schema storing the same data analyzed in different ways.

Cheers
Avlesh

On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar <bas_sree@yahoo.co.in
> wrote:

> Hi,
>
> Can any one help me with the below scenario?.
>
> Scenario 1:
>
> Assume that I give Google as input string
> i am using Carrot with Solr
> Carrot is for front end display purpose
> the issue is
> Assuming i give "BHASKAR" as input string
> It should give me search results pertaining to BHASKAR only.
>  Select * from MASTER where name ="Bhaskar";
>  Example:It should not display search results as "ChandarBhaskar" or
>  "BhaskarC".
>  Should display Bhaskar only.
>
> Scenario 2:
>  Select * from MASTER where name like "%BHASKAR%";
>  It should display records containing the word BHASKAR
>  Ex: Bhaskar
> ChandarBhaskar
>  BhaskarC
>  Bhaskarabc
>
>  How to achieve Scenario 1 in Solr ?.
>
>
>
> Regards
> Bhaskar
>
>
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Pattern matching in Solr

Posted by Avlesh Singh <av...@gmail.com>.
You could have used your previous thread itself (
http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr),
Bhaskar.

In your scenario one, you need an exact token match, right? You are getting
expected results if your field type is "text". Look for the
"WordDelimiterFilterFactory" in your field type definition for the text
field inside schema.xml. You'll find an attribute splitOnCaseChange="1".
Because of this, "ChandarBhaskar" is converted into two tokens "Chandra" and
"Bhaskar" and hence the matches. You may choose to remove this attribute if
the behaviour is not desired.

For your scenario two, you may want to look at the KeywordTokenizerFactory
and EdgeNGramFilterFactory on Solr wiki.

Generally, for all such use cases people create multiple fields in their
schema storing the same data analyzed in different ways.

Cheers
Avlesh

On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar <bas_sree@yahoo.co.in
> wrote:

> Hi,
>
> Can any one help me with the below scenario?.
>
> Scenario 1:
>
> Assume that I give Google as input string
> i am using Carrot with Solr
> Carrot is for front end display purpose
> the issue is
> Assuming i give "BHASKAR" as input string
> It should give me search results pertaining to BHASKAR only.
>  Select * from MASTER where name ="Bhaskar";
>  Example:It should not display search results as "ChandarBhaskar" or
>  "BhaskarC".
>  Should display Bhaskar only.
>
> Scenario 2:
>  Select * from MASTER where name like "%BHASKAR%";
>  It should display records containing the word BHASKAR
>  Ex: Bhaskar
> ChandarBhaskar
>  BhaskarC
>  Bhaskarabc
>
>  How to achieve Scenario 1 in Solr ?.
>
>
>
> Regards
> Bhaskar
>
>
>
>