You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)" <ex...@us.bosch.com> on 2014/07/18 23:00:16 UTC

text search problem

Hi,  Below is the text_general field type when I search Text:Boradway  it is not returning all the records, it returning only few records. But when I search for Text:*Broadway*, it is getting more records. When I get into multiple words ln search like "Broadway Hotel", it may not get "Broadway" , "Hotel"  &  "Broadway Hotel". DO you have any thought how to handle these type of keyword search.

Text:"Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car Wash Water Recovery"

My Field type look like this.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
         <charFilter class="solr.HTMLStripCharFilterFactory" />
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
              <filter class="solr.KStemFilterFactory"/>
              <filter class="solr.LowerCaseFilterFactory"/>
              <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="0" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="0"/>

              <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->

      </analyzer>
      <analyzer type="query">
         <charFilter class="solr.HTMLStripCharFilterFactory" />
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
              <filter class="solr.KStemFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
              <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="0" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="0"/>

         </analyzer>
    </fieldType>



Do you have any thought the behavior or how to get this?

Thanks

Ravi

Re: text search problem

Posted by Josh Lincoln <jo...@gmail.com>.

Ravi, for the hyphen issue, try setting autoGeneratePhraseQueries=true for
that fieldType (no re-index needed). As of 1.4, this defaults to false. One
word of caution, autoGeneratePhraseQueries may not work as expected for
langauges that aren't whitespace delimited. As Erick mentioned, the
Analysis page will help you verify that your content and your queries are
handled the way you expect them to be.

See this thread for more info on autoGeneratePhraseQueries
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%3C439F69A3-F292-482B-A102-7C011C576062@gmail.com%3E


On Mon, Jul 21, 2014 at 8:42 PM, Erick Erickson <er...@gmail.com>
wrote:

> Try escaping the hyphen as \-. Or enclosing it all
> in quotes.
>
> But you _really_ have to spend some time with the debug option
> an admin/analysis page or you will find endless surprises.
>
> Best,
> Erick
>
>
> On Mon, Jul 21, 2014 at 11:12 AM, EXTERNAL Taminidi Ravi (ETI,
> Automotive-Service-Solutions) <ex...@us.bosch.com> wrote:
>
> >
> > Thanks for the reply Erick, I will try as you suggested. There I have
> >  another question related to this lines.
> >
> > When I have "-" in my description , name then the search results are
> > different. For e.g.
> >
> > "ABC-123" , it look sofr ABC or 123, I want to treat this search as exact
> > match, i.e if my document has ABC-123 then I should get the results.
> >
> > When I check with &hl-on, it has <em>ABC<em> and get the results. How can
> > I avoid this situation.
> >
> > Thanks
> >
> > Ravi
> >
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > Sent: Saturday, July 19, 2014 4:40 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: text search problem
> >
> > Try adding &debug=all to the query and see what the parsed form of the
> > query is, likely you're
> > 1> using phrase queries, so "broadway hotel" requires both words in the
> > 1> text
> > or
> > 2> if you're not using phrases, you're searching for the AND of the two
> > terms.
> >
> > But debug=all will show you.
> >
> > Plus, take a look at the admin/analysis page, your tokenization may not
> be
> > what you expect.
> >
> > Best,
> > Erick
> >
> >
> > On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
> > Automotive-Service-Solutions) <ex...@us.bosch.com>
> wrote:
> >
> > > Hi,  Below is the text_general field type when I search Text:Boradway
> > > it is not returning all the records, it returning only few records.
> > > But when I search for Text:*Broadway*, it is getting more records.
> > > When I get into multiple words ln search like "Broadway Hotel", it may
> > > not get "Broadway" , "Hotel"  &  "Broadway Hotel". DO you have any
> > > thought how to handle these type of keyword search.
> > >
> > > Text:"Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car
> > > Wash Water Recovery"
> > >
> > > My Field type look like this.
> > >
> > > <fieldType name="text_general" class="solr.TextField"
> > > positionIncrementGap="100">
> > >       <analyzer type="index">
> > >          <charFilter class="solr.HTMLStripCharFilterFactory" />
> > >       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="stopwords.txt" />
> > >               <filter class="solr.KStemFilterFactory"/>
> > >               <filter class="solr.LowerCaseFilterFactory"/>
> > >               <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="0" generateNumberParts="0" splitOnCaseChange="0"
> > > splitOnNumerics="0" stemEnglishPossessive="0" catenateWords="1"
> > > catenateNumbers="1" catenateAll="1" preserveOriginal="0"/>
> > >
> > >               <!-- in this example, we will only use synonyms at query
> > time
> > >         <filter class="solr.SynonymFilterFactory"
> > > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
> > >         -->
> > >
> > >       </analyzer>
> > >       <analyzer type="query">
> > >          <charFilter class="solr.HTMLStripCharFilterFactory" />
> > >      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >               <filter class="solr.KStemFilterFactory"/>
> > >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="stopwords.txt" />
> > >         <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
> > > ignoreCase="true" expand="true"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >               <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="0" generateNumberParts="0" splitOnCaseChange="0"
> > > splitOnNumerics="0" stemEnglishPossessive="0" catenateWords="1"
> > > catenateNumbers="1" catenateAll="1" preserveOriginal="0"/>
> > >
> > >          </analyzer>
> > >     </fieldType>
> > >
> > >
> > >
> > > Do you have any thought the behavior or how to get this?
> > >
> > > Thanks
> > >
> > > Ravi
> > >
> >
>

Re: text search problem

Posted by Erick Erickson <er...@gmail.com>.

Try escaping the hyphen as \-. Or enclosing it all
in quotes.

But you _really_ have to spend some time with the debug option
an admin/analysis page or you will find endless surprises.

Best,
Erick


On Mon, Jul 21, 2014 at 11:12 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) <ex...@us.bosch.com> wrote:

>
> Thanks for the reply Erick, I will try as you suggested. There I have
>  another question related to this lines.
>
> When I have "-" in my description , name then the search results are
> different. For e.g.
>
> "ABC-123" , it look sofr ABC or 123, I want to treat this search as exact
> match, i.e if my document has ABC-123 then I should get the results.
>
> When I check with &hl-on, it has <em>ABC<em> and get the results. How can
> I avoid this situation.
>
> Thanks
>
> Ravi
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Saturday, July 19, 2014 4:40 PM
> To: solr-user@lucene.apache.org
> Subject: Re: text search problem
>
> Try adding &debug=all to the query and see what the parsed form of the
> query is, likely you're
> 1> using phrase queries, so "broadway hotel" requires both words in the
> 1> text
> or
> 2> if you're not using phrases, you're searching for the AND of the two
> terms.
>
> But debug=all will show you.
>
> Plus, take a look at the admin/analysis page, your tokenization may not be
> what you expect.
>
> Best,
> Erick
>
>
> On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
> Automotive-Service-Solutions) <ex...@us.bosch.com> wrote:
>
> > Hi,  Below is the text_general field type when I search Text:Boradway
> > it is not returning all the records, it returning only few records.
> > But when I search for Text:*Broadway*, it is getting more records.
> > When I get into multiple words ln search like "Broadway Hotel", it may
> > not get "Broadway" , "Hotel"  &  "Broadway Hotel". DO you have any
> > thought how to handle these type of keyword search.
> >
> > Text:"Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car
> > Wash Water Recovery"
> >
> > My Field type look like this.
> >
> > <fieldType name="text_general" class="solr.TextField"
> > positionIncrementGap="100">
> >       <analyzer type="index">
> >          <charFilter class="solr.HTMLStripCharFilterFactory" />
> >       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> >               <filter class="solr.KStemFilterFactory"/>
> >               <filter class="solr.LowerCaseFilterFactory"/>
> >               <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="0" generateNumberParts="0" splitOnCaseChange="0"
> > splitOnNumerics="0" stemEnglishPossessive="0" catenateWords="1"
> > catenateNumbers="1" catenateAll="1" preserveOriginal="0"/>
> >
> >               <!-- in this example, we will only use synonyms at query
> time
> >         <filter class="solr.SynonymFilterFactory"
> > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
> >         -->
> >
> >       </analyzer>
> >       <analyzer type="query">
> >          <charFilter class="solr.HTMLStripCharFilterFactory" />
> >      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >               <filter class="solr.KStemFilterFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> >         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >               <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="0" generateNumberParts="0" splitOnCaseChange="0"
> > splitOnNumerics="0" stemEnglishPossessive="0" catenateWords="1"
> > catenateNumbers="1" catenateAll="1" preserveOriginal="0"/>
> >
> >          </analyzer>
> >     </fieldType>
> >
> >
> >
> > Do you have any thought the behavior or how to get this?
> >
> > Thanks
> >
> > Ravi
> >
>

RE: text search problem

Posted by "EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)" <ex...@us.bosch.com>.

Thanks for the reply Erick, I will try as you suggested. There I have  another question related to this lines.

When I have "-" in my description , name then the search results are different. For e.g.

"ABC-123" , it look sofr ABC or 123, I want to treat this search as exact match, i.e if my document has ABC-123 then I should get the results. 

When I check with &hl-on, it has <em>ABC<em> and get the results. How can I avoid this situation.

Thanks

Ravi


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Saturday, July 19, 2014 4:40 PM
To: solr-user@lucene.apache.org
Subject: Re: text search problem

Try adding &debug=all to the query and see what the parsed form of the query is, likely you're
1> using phrase queries, so "broadway hotel" requires both words in the 
1> text
or
2> if you're not using phrases, you're searching for the AND of the two
terms.

But debug=all will show you.

Plus, take a look at the admin/analysis page, your tokenization may not be what you expect.

Best,
Erick


On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) <ex...@us.bosch.com> wrote:

> Hi,  Below is the text_general field type when I search Text:Boradway  
> it is not returning all the records, it returning only few records. 
> But when I search for Text:*Broadway*, it is getting more records. 
> When I get into multiple words ln search like "Broadway Hotel", it may 
> not get "Broadway" , "Hotel"  &  "Broadway Hotel". DO you have any 
> thought how to handle these type of keyword search.
>
> Text:"Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car 
> Wash Water Recovery"
>
> My Field type look like this.
>
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>          <charFilter class="solr.HTMLStripCharFilterFactory" />
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>               <filter class="solr.KStemFilterFactory"/>
>               <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" splitOnCaseChange="0"
> splitOnNumerics="0" stemEnglishPossessive="0" catenateWords="1"
> catenateNumbers="1" catenateAll="1" preserveOriginal="0"/>
>
>               <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>
>       </analyzer>
>       <analyzer type="query">
>          <charFilter class="solr.HTMLStripCharFilterFactory" />
>      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.KStemFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" splitOnCaseChange="0"
> splitOnNumerics="0" stemEnglishPossessive="0" catenateWords="1"
> catenateNumbers="1" catenateAll="1" preserveOriginal="0"/>
>
>          </analyzer>
>     </fieldType>
>
>
>
> Do you have any thought the behavior or how to get this?
>
> Thanks
>
> Ravi
>

Re: text search problem

Posted by Erick Erickson <er...@gmail.com>.

Try adding &debug=all to the query and see what the parsed form of the query
is, likely you're
1> using phrase queries, so "broadway hotel" requires both words in the text
or
2> if you're not using phrases, you're searching for the AND of the two
terms.

But debug=all will show you.

Plus, take a look at the admin/analysis page, your tokenization may not be
what
you expect.

Best,
Erick


On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) <ex...@us.bosch.com> wrote:

> Hi,  Below is the text_general field type when I search Text:Boradway  it
> is not returning all the records, it returning only few records. But when I
> search for Text:*Broadway*, it is getting more records. When I get into
> multiple words ln search like "Broadway Hotel", it may not get "Broadway" ,
> "Hotel"  &  "Broadway Hotel". DO you have any thought how to handle these
> type of keyword search.
>
> Text:"Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car Wash
> Water Recovery"
>
> My Field type look like this.
>
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>          <charFilter class="solr.HTMLStripCharFilterFactory" />
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>               <filter class="solr.KStemFilterFactory"/>
>               <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" splitOnCaseChange="0"
> splitOnNumerics="0" stemEnglishPossessive="0" catenateWords="1"
> catenateNumbers="1" catenateAll="1" preserveOriginal="0"/>
>
>               <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>
>       </analyzer>
>       <analyzer type="query">
>          <charFilter class="solr.HTMLStripCharFilterFactory" />
>      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.KStemFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" splitOnCaseChange="0"
> splitOnNumerics="0" stemEnglishPossessive="0" catenateWords="1"
> catenateNumbers="1" catenateAll="1" preserveOriginal="0"/>
>
>          </analyzer>
>     </fieldType>
>
>
>
> Do you have any thought the behavior or how to get this?
>
> Thanks
>
> Ravi
>