You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by sunshine glass <su...@gmail.com> on 2014/07/30 16:38:16 UTC

Re: Searching words with spaces for word without spaces in solr

This is the new configuration:

    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true" tokenSeparator=""/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>           <filter class="solr.SynonymFilterFactory"
> synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> expand="true"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true" tokenSeparator=""/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>       </fieldType>
>
>
These are current docs in my index:

<result name="response" numFound="3" start="0">
<doc>
<str name="id">2</str>
<str name="title">Icecream</str>
<long name="_version_">1475063961342705664</long>
</doc>
<doc>
<str name="id">3</str>
<str name="title">Ice-cream</str>
<long name="_version_">1475063961344802816</long>
</doc>
<doc>
<str name="id">1</str>
<str name="title">Ice Cream</str>
<long name="_version_">1475063961203245056</long>
</doc>
</result>
</response>

Query:
http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true

Response:

<result name="response" numFound="2" start="0">
<doc>
<str name="id">1</str>
<str name="title">Ice Cream</str>
<long name="_version_">1475063961203245056</long>
</doc>
<doc>
<str name="id">3</str>
<str name="title">Ice-cream</str>
<long name="_version_">1475063961344802816</long>
</doc>
</result>
<lst name="debug">
<str name="rawquerystring">title:ice cream</str>
<str name="querystring">title:ice cream</str>
<str name="parsedquery">
(+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord
</str>
<str name="parsedquery_toString">+(title:ice (title:cream))</str>
<lst name="explain">
<str name="1">
0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
[DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
maxDocs=3) 0.4375 = fieldNorm(doc=0)
</str>
<str name="3">
0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
[DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq
of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
fieldNorm(doc=2)
</str>
</lst>

Still not working ????


On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <er...@gmail.com>
wrote:

> I'd spend some time with the admin/analysis page to understand the exact
> tokenization going on here. For instance, sequencing the
> shinglefilterfactory before worddelimiterfilterfactory may produce
> "interesting" resutls. And then throwing the Snowball factory at it and
> putting synonyms in front.... I suspect you're not indexing or searching
> what you think you are.
>
> Second, what happens when you query with &debug=query? That'll show you
> what the search string looks like.
>
> If that doesn't help, please post the results of looking at those things
> here, that'll provide some information for us to work with.
>
> Best,
> Erick
>
>
> On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
> sunshineglassof2day@gmail.com> wrote:
>
> > Hi Folks,
> >
> > Any updates ??
> >
> >
> > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
> > sunshineglassof2day@gmail.com> wrote:
> >
> > > Dear Team,
> > >
> > > How can I handle compound word searches in solr ?.
> > > How can i search "hand bag" if I have "handbag" in my index. While
> using
> > > shingle in query analyzer, the query "ice cube" creates three tokens as
> > > "ice","cube", "icecube". Only ice and cubes are searched but not
> > > "icecubes".i.e not working for pair though I am using shingle filter.
> > >
> > > Here's the schema config.
> > >
> > >
> > >    1.  <fieldType name="text" class="solr.TextField"
> > >    positionIncrementGap="100">
> > >    2.       <analyzer type="index">
> > >    3.         <filter class="solr.SynonymFilterFactory"
> > >    synonyms="synonyms_text_prime_index.txt" ignoreCase="true"
> > expand="true"/>
> > >    4.         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > >    5.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >    6.          <filter class="solr.ShingleFilterFactory"
> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > >    7.          <filter class="solr.WordDelimiterFilterFactory"
> > >    catenateWords="1" catenateNumbers="1" catenateAll="1"
> > preserveOriginal="1"
> > >    generateWordParts="1" generateNumberParts="1"/>
> > >    8.         <filter class="solr.LowerCaseFilterFactory"/>
> > >    9.         <filter class="solr.SnowballPorterFilterFactory"
> > >    language="English" protected="protwords.txt"/>
> > >    10.       </analyzer>
> > >    11.       <analyzer type="query">
> > >    12.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >    13.         <filter class="solr.SynonymFilterFactory"
> > >    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> > >    14.         <filter class="solr.ShingleFilterFactory"
> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > >    15.         <filter class="solr.WordDelimiterFilterFactory"
> > >    preserveOriginal="1"/>
> > >    16.         <filter class="solr.LowerCaseFilterFactory"/>
> > >    17.         <filter class="solr.SnowballPorterFilterFactory"
> > >    language="English" protected="protwords.txt"/>
> > >    18.       </analyzer>
> > >    19.     </fieldType>
> > >
> > >    Any help is appreciated.
> > >
> > >
> >
>

Re: Searching words with spaces for word without spaces in solr

Posted by Umesh Prasad <um...@gmail.com>.

 I would suggest  breaking the problem in smaller parts
1.  Identify variations(say compound words) offline (where you can combine
multiple sources to ensure much better quality).
2. Expand the user query during search time using your sources. So query
will become
    icecream OR  (ice cream)   (with q.op=AND)
   Parse the query using LuceneQuery parser. If you are using
dismax/edismax then I would suggest plugging a custom query parser which
combines queries from LuceneQueryParser and dismaxQuery. (dismax/edsimax
doesn't support full lucene query syntax)





On 31 July 2014 22:39, sunshine glass <su...@gmail.com> wrote:

> *Point 1:*
> On Thu, Jul 31, 2014 at 9:32 PM, Dyer, James <James.Dyer@ingramcontent.com
> >
>  wrote:
>
> > If a user is searching on "ice cream" but your index has "icecream", you
> > can treat this like a spelling error.  WordBreakSolrSpellChecker would
> > identify the fact that  while "ice cream" is not in your index,
> "icecream"
> > and then you can re-query for the corrected version without the space.
> >
>
> What if I have  1M records for "ice cream" & same number for "icecream".
> Then trick will not work here. What is desire in this case is that either I
> search for "ice cream" or "icecream", Solr should return 2M results.
>
> *Point 2:*
> On Thu, Jul 31, 2014 at 9:32 PM, Dyer, James <James.Dyer@ingramcontent.com
> >
>  wrote:
> The problem with solving this with analyers, is that you can analyze
> "ice-cream" as either "ice cream" or "icecream" (split or catenate on
> hyphen).  You can even analyze "IceCream > Ice Cream" (catenate on case
> change).  But how is your analyzer going to know that "icecream" should
> index as two tokens: "ice" "cream" ?  You're asking analysis to do too much
> in this case. This is where spellcheck can bridge the gap.
>
> I don't want "icecream" to be indexed as "ice" or "cream". I agree that
> this is not feasible. What I am looking forward is to create shingles at
> query time as well. In more words, while querying "ice cream", Can't it
> search as "ice" or "cream" or "icecream".
> That is forming shingles at query time.
>
> There is a long list of such words in my inde. So, I does want to implement
> via synonym filter factory.
>
>
> On Thu, Jul 31, 2014 at 9:32 PM, Dyer, James <James.Dyer@ingramcontent.com
> >
> wrote:
>
> > If a user is searching on "ice cream" but your index has "icecream", you
> > can treat this like a spelling error.  WordBreakSolrSpellChecker would
> > identify the fact that  while "ice cream" is not in your index,
> "icecream"
> > and then you can re-query for the corrected version without the space.
> >
> > The problem with solving this with analyers, is that you can analyze
> > "ice-cream" as either "ice cream" or "icecream" (split or catenate on
> > hyphen).  You can even analyze "IceCream > Ice Cream" (catenate on case
> > change).  But how is your analyzer going to know that "icecream" should
> > index as two tokens: "ice" "cream" ?  You're asking analysis to do too
> much
> > in this case.  This is where spellcheck can bridge the gap.
> >
> > Of course, if you have a discrete list of words you want split like this,
> > then you can do it with analysis using index-time synonyms.  In this
> case,
> > you need to provide it with the list.  See
> >
> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> > for more information.
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: sunshine glass [mailto:sunshineglassof2day@gmail.com]
> > Sent: Thursday, July 31, 2014 10:32 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Searching words with spaces for word without spaces in solr
> >
> > I am not clear with this. This link is related to spell check. Can you
> > elaborate it more ?
> >
> >
> > On Wed, Jul 30, 2014 at 9:17 PM, Dyer, James <
> James.Dyer@ingramcontent.com
> > >
> > wrote:
> >
> > > In addition to the analyzer configuration you're using, you might want
> to
> > > also use WordBreakSolrSpellChecker to catch possible matches that can't
> > > easily be solved through analysis.  For more information, see the
> section
> > > for it at
> > https://cwiki.apache.org/confluence/display/solr/Spell+Checking
> > >
> > > James Dyer
> > > Ingram Content Group
> > > (615) 213-4311
> > >
> > > -----Original Message-----
> > > From: sunshine glass [mailto:sunshineglassof2day@gmail.com]
> > > Sent: Wednesday, July 30, 2014 9:38 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Searching words with spaces for word without spaces in
> solr
> > >
> > > This is the new configuration:
> > >
> > >     <fieldType name="text" class="solr.TextField"
> > > > positionIncrementGap="100">
> > > >       <analyzer type="index">
> > > >         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > >         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> > > > outputUnigrams="true" tokenSeparator=""/>
> > > >         <filter class="solr.WordDelimiterFilterFactory"
> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > >         <filter class="solr.SnowballPorterFilterFactory"
> > > > language="English" protected="protwords.txt"/>
> > > >           <filter class="solr.SynonymFilterFactory"
> > > > synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> > > > expand="true"/>
> > > >       </analyzer>
> > > >       <analyzer type="query">
> > > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > > words="stopwords_text_prime_search.txt"
> enablePositionIncrements="true"
> > > />
> > > >         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> > > > outputUnigrams="true" tokenSeparator=""/>
> > > >         <filter class="solr.WordDelimiterFilterFactory"
> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > > catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
> > > >         <filter class="solr.SnowballPorterFilterFactory"
> > > > language="English" protected="protwords.txt"/>
> > > >       </fieldType>
> > > >
> > > >
> > > These are current docs in my index:
> > >
> > > <result name="response" numFound="3" start="0">
> > > <doc>
> > > <str name="id">2</str>
> > > <str name="title">Icecream</str>
> > > <long name="_version_">1475063961342705664</long>
> > > </doc>
> > > <doc>
> > > <str name="id">3</str>
> > > <str name="title">Ice-cream</str>
> > > <long name="_version_">1475063961344802816</long>
> > > </doc>
> > > <doc>
> > > <str name="id">1</str>
> > > <str name="title">Ice Cream</str>
> > > <long name="_version_">1475063961203245056</long>
> > > </doc>
> > > </result>
> > > </response>
> > >
> > > Query:
> > >
> >
> http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true
> > >
> > > Response:
> > >
> > > <result name="response" numFound="2" start="0">
> > > <doc>
> > > <str name="id">1</str>
> > > <str name="title">Ice Cream</str>
> > > <long name="_version_">1475063961203245056</long>
> > > </doc>
> > > <doc>
> > > <str name="id">3</str>
> > > <str name="title">Ice-cream</str>
> > > <long name="_version_">1475063961344802816</long>
> > > </doc>
> > > </result>
> > > <lst name="debug">
> > > <str name="rawquerystring">title:ice cream</str>
> > > <str name="querystring">title:ice cream</str>
> > > <str name="parsedquery">
> > > (+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord
> > > </str>
> > > <str name="parsedquery_toString">+(title:ice (title:cream))</str>
> > > <lst name="explain">
> > > <str name="1">
> > > 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
> > > [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
> > > termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> > > idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 =
> fieldWeight
> > > in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
> > > termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
> > > 0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result
> of:
> > > 0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677
> =
> > > queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> > > queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
> > > tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
> > > maxDocs=3) 0.4375 = fieldNorm(doc=0)
> > > </str>
> > > <str name="3">
> > > 0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in
> 2)
> > > [DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
> > > termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> > > idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in
> 2,
> > > product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
> > > idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
> > > weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
> > > score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
> > > queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> > > queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with
> > freq
> > > of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
> > > fieldNorm(doc=2)
> > > </str>
> > > </lst>
> > >
> > > Still not working ????
> > >
> > >
> > > On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <
> erickerickson@gmail.com
> > >
> > > wrote:
> > >
> > > > I'd spend some time with the admin/analysis page to understand the
> > exact
> > > > tokenization going on here. For instance, sequencing the
> > > > shinglefilterfactory before worddelimiterfilterfactory may produce
> > > > "interesting" resutls. And then throwing the Snowball factory at it
> and
> > > > putting synonyms in front.... I suspect you're not indexing or
> > searching
> > > > what you think you are.
> > > >
> > > > Second, what happens when you query with &debug=query? That'll show
> you
> > > > what the search string looks like.
> > > >
> > > > If that doesn't help, please post the results of looking at those
> > things
> > > > here, that'll provide some information for us to work with.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > >
> > > > On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
> > > > sunshineglassof2day@gmail.com> wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > Any updates ??
> > > > >
> > > > >
> > > > > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
> > > > > sunshineglassof2day@gmail.com> wrote:
> > > > >
> > > > > > Dear Team,
> > > > > >
> > > > > > How can I handle compound word searches in solr ?.
> > > > > > How can i search "hand bag" if I have "handbag" in my index.
> While
> > > > using
> > > > > > shingle in query analyzer, the query "ice cube" creates three
> > tokens
> > > as
> > > > > > "ice","cube", "icecube". Only ice and cubes are searched but not
> > > > > > "icecubes".i.e not working for pair though I am using shingle
> > filter.
> > > > > >
> > > > > > Here's the schema config.
> > > > > >
> > > > > >
> > > > > >    1.  <fieldType name="text" class="solr.TextField"
> > > > > >    positionIncrementGap="100">
> > > > > >    2.       <analyzer type="index">
> > > > > >    3.         <filter class="solr.SynonymFilterFactory"
> > > > > >    synonyms="synonyms_text_prime_index.txt" ignoreCase="true"
> > > > > expand="true"/>
> > > > > >    4.         <charFilter
> class="solr.HTMLStripCharFilterFactory"/>
> > > > > >    5.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > > > >    6.          <filter class="solr.ShingleFilterFactory"
> > > > > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > > > > >    7.          <filter class="solr.WordDelimiterFilterFactory"
> > > > > >    catenateWords="1" catenateNumbers="1" catenateAll="1"
> > > > > preserveOriginal="1"
> > > > > >    generateWordParts="1" generateNumberParts="1"/>
> > > > > >    8.         <filter class="solr.LowerCaseFilterFactory"/>
> > > > > >    9.         <filter class="solr.SnowballPorterFilterFactory"
> > > > > >    language="English" protected="protwords.txt"/>
> > > > > >    10.       </analyzer>
> > > > > >    11.       <analyzer type="query">
> > > > > >    12.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > > > >    13.         <filter class="solr.SynonymFilterFactory"
> > > > > >    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> > > > > >    14.         <filter class="solr.ShingleFilterFactory"
> > > > > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > > > > >    15.         <filter class="solr.WordDelimiterFilterFactory"
> > > > > >    preserveOriginal="1"/>
> > > > > >    16.         <filter class="solr.LowerCaseFilterFactory"/>
> > > > > >    17.         <filter class="solr.SnowballPorterFilterFactory"
> > > > > >    language="English" protected="protwords.txt"/>
> > > > > >    18.       </analyzer>
> > > > > >    19.     </fieldType>
> > > > > >
> > > > > >    Any help is appreciated.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
---
Thanks & Regards
Umesh Prasad

Re: Searching words with spaces for word without spaces in solr

Posted by sunshine glass <su...@gmail.com>.

*Point 1:*
On Thu, Jul 31, 2014 at 9:32 PM, Dyer, James <Ja...@ingramcontent.com>
 wrote:

> If a user is searching on "ice cream" but your index has "icecream", you
> can treat this like a spelling error.  WordBreakSolrSpellChecker would
> identify the fact that  while "ice cream" is not in your index, "icecream"
> and then you can re-query for the corrected version without the space.
>

What if I have  1M records for "ice cream" & same number for "icecream".
Then trick will not work here. What is desire in this case is that either I
search for "ice cream" or "icecream", Solr should return 2M results.

*Point 2:*
On Thu, Jul 31, 2014 at 9:32 PM, Dyer, James <Ja...@ingramcontent.com>
 wrote:
The problem with solving this with analyers, is that you can analyze
"ice-cream" as either "ice cream" or "icecream" (split or catenate on
hyphen).  You can even analyze "IceCream > Ice Cream" (catenate on case
change).  But how is your analyzer going to know that "icecream" should
index as two tokens: "ice" "cream" ?  You're asking analysis to do too much
in this case. This is where spellcheck can bridge the gap.

I don't want "icecream" to be indexed as "ice" or "cream". I agree that
this is not feasible. What I am looking forward is to create shingles at
query time as well. In more words, while querying "ice cream", Can't it
search as "ice" or "cream" or "icecream".
That is forming shingles at query time.

There is a long list of such words in my inde. So, I does want to implement
via synonym filter factory.


On Thu, Jul 31, 2014 at 9:32 PM, Dyer, James <Ja...@ingramcontent.com>
wrote:

> If a user is searching on "ice cream" but your index has "icecream", you
> can treat this like a spelling error.  WordBreakSolrSpellChecker would
> identify the fact that  while "ice cream" is not in your index, "icecream"
> and then you can re-query for the corrected version without the space.
>
> The problem with solving this with analyers, is that you can analyze
> "ice-cream" as either "ice cream" or "icecream" (split or catenate on
> hyphen).  You can even analyze "IceCream > Ice Cream" (catenate on case
> change).  But how is your analyzer going to know that "icecream" should
> index as two tokens: "ice" "cream" ?  You're asking analysis to do too much
> in this case.  This is where spellcheck can bridge the gap.
>
> Of course, if you have a discrete list of words you want split like this,
> then you can do it with analysis using index-time synonyms.  In this case,
> you need to provide it with the list.  See
> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> for more information.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: sunshine glass [mailto:sunshineglassof2day@gmail.com]
> Sent: Thursday, July 31, 2014 10:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Searching words with spaces for word without spaces in solr
>
> I am not clear with this. This link is related to spell check. Can you
> elaborate it more ?
>
>
> On Wed, Jul 30, 2014 at 9:17 PM, Dyer, James <James.Dyer@ingramcontent.com
> >
> wrote:
>
> > In addition to the analyzer configuration you're using, you might want to
> > also use WordBreakSolrSpellChecker to catch possible matches that can't
> > easily be solved through analysis.  For more information, see the section
> > for it at
> https://cwiki.apache.org/confluence/display/solr/Spell+Checking
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> > -----Original Message-----
> > From: sunshine glass [mailto:sunshineglassof2day@gmail.com]
> > Sent: Wednesday, July 30, 2014 9:38 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Searching words with spaces for word without spaces in solr
> >
> > This is the new configuration:
> >
> >     <fieldType name="text" class="solr.TextField"
> > > positionIncrementGap="100">
> > >       <analyzer type="index">
> > >         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> > > outputUnigrams="true" tokenSeparator=""/>
> > >         <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >         <filter class="solr.SnowballPorterFilterFactory"
> > > language="English" protected="protwords.txt"/>
> > >           <filter class="solr.SynonymFilterFactory"
> > > synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> > > expand="true"/>
> > >       </analyzer>
> > >       <analyzer type="query">
> > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="stopwords_text_prime_search.txt" enablePositionIncrements="true"
> > />
> > >         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> > > outputUnigrams="true" tokenSeparator=""/>
> > >         <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
> > >         <filter class="solr.SnowballPorterFilterFactory"
> > > language="English" protected="protwords.txt"/>
> > >       </fieldType>
> > >
> > >
> > These are current docs in my index:
> >
> > <result name="response" numFound="3" start="0">
> > <doc>
> > <str name="id">2</str>
> > <str name="title">Icecream</str>
> > <long name="_version_">1475063961342705664</long>
> > </doc>
> > <doc>
> > <str name="id">3</str>
> > <str name="title">Ice-cream</str>
> > <long name="_version_">1475063961344802816</long>
> > </doc>
> > <doc>
> > <str name="id">1</str>
> > <str name="title">Ice Cream</str>
> > <long name="_version_">1475063961203245056</long>
> > </doc>
> > </result>
> > </response>
> >
> > Query:
> >
> http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true
> >
> > Response:
> >
> > <result name="response" numFound="2" start="0">
> > <doc>
> > <str name="id">1</str>
> > <str name="title">Ice Cream</str>
> > <long name="_version_">1475063961203245056</long>
> > </doc>
> > <doc>
> > <str name="id">3</str>
> > <str name="title">Ice-cream</str>
> > <long name="_version_">1475063961344802816</long>
> > </doc>
> > </result>
> > <lst name="debug">
> > <str name="rawquerystring">title:ice cream</str>
> > <str name="querystring">title:ice cream</str>
> > <str name="parsedquery">
> > (+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord
> > </str>
> > <str name="parsedquery_toString">+(title:ice (title:cream))</str>
> > <lst name="explain">
> > <str name="1">
> > 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
> > [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
> > termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> > idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
> > in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
> > termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
> > 0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
> > 0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
> > queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> > queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
> > tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
> > maxDocs=3) 0.4375 = fieldNorm(doc=0)
> > </str>
> > <str name="3">
> > 0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
> > [DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
> > termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> > idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
> > product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
> > idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
> > weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
> > score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
> > queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> > queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with
> freq
> > of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
> > fieldNorm(doc=2)
> > </str>
> > </lst>
> >
> > Still not working ????
> >
> >
> > On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> > > I'd spend some time with the admin/analysis page to understand the
> exact
> > > tokenization going on here. For instance, sequencing the
> > > shinglefilterfactory before worddelimiterfilterfactory may produce
> > > "interesting" resutls. And then throwing the Snowball factory at it and
> > > putting synonyms in front.... I suspect you're not indexing or
> searching
> > > what you think you are.
> > >
> > > Second, what happens when you query with &debug=query? That'll show you
> > > what the search string looks like.
> > >
> > > If that doesn't help, please post the results of looking at those
> things
> > > here, that'll provide some information for us to work with.
> > >
> > > Best,
> > > Erick
> > >
> > >
> > > On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
> > > sunshineglassof2day@gmail.com> wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > Any updates ??
> > > >
> > > >
> > > > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
> > > > sunshineglassof2day@gmail.com> wrote:
> > > >
> > > > > Dear Team,
> > > > >
> > > > > How can I handle compound word searches in solr ?.
> > > > > How can i search "hand bag" if I have "handbag" in my index. While
> > > using
> > > > > shingle in query analyzer, the query "ice cube" creates three
> tokens
> > as
> > > > > "ice","cube", "icecube". Only ice and cubes are searched but not
> > > > > "icecubes".i.e not working for pair though I am using shingle
> filter.
> > > > >
> > > > > Here's the schema config.
> > > > >
> > > > >
> > > > >    1.  <fieldType name="text" class="solr.TextField"
> > > > >    positionIncrementGap="100">
> > > > >    2.       <analyzer type="index">
> > > > >    3.         <filter class="solr.SynonymFilterFactory"
> > > > >    synonyms="synonyms_text_prime_index.txt" ignoreCase="true"
> > > > expand="true"/>
> > > > >    4.         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > > > >    5.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > > >    6.          <filter class="solr.ShingleFilterFactory"
> > > > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > > > >    7.          <filter class="solr.WordDelimiterFilterFactory"
> > > > >    catenateWords="1" catenateNumbers="1" catenateAll="1"
> > > > preserveOriginal="1"
> > > > >    generateWordParts="1" generateNumberParts="1"/>
> > > > >    8.         <filter class="solr.LowerCaseFilterFactory"/>
> > > > >    9.         <filter class="solr.SnowballPorterFilterFactory"
> > > > >    language="English" protected="protwords.txt"/>
> > > > >    10.       </analyzer>
> > > > >    11.       <analyzer type="query">
> > > > >    12.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > > >    13.         <filter class="solr.SynonymFilterFactory"
> > > > >    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> > > > >    14.         <filter class="solr.ShingleFilterFactory"
> > > > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > > > >    15.         <filter class="solr.WordDelimiterFilterFactory"
> > > > >    preserveOriginal="1"/>
> > > > >    16.         <filter class="solr.LowerCaseFilterFactory"/>
> > > > >    17.         <filter class="solr.SnowballPorterFilterFactory"
> > > > >    language="English" protected="protwords.txt"/>
> > > > >    18.       </analyzer>
> > > > >    19.     </fieldType>
> > > > >
> > > > >    Any help is appreciated.
> > > > >
> > > > >
> > > >
> > >
> >
>

RE: Searching words with spaces for word without spaces in solr

Posted by "Dyer, James" <Ja...@ingramcontent.com>.

If a user is searching on "ice cream" but your index has "icecream", you can treat this like a spelling error.  WordBreakSolrSpellChecker would identify the fact that  while "ice cream" is not in your index, "icecream" and then you can re-query for the corrected version without the space.

The problem with solving this with analyers, is that you can analyze "ice-cream" as either "ice cream" or "icecream" (split or catenate on hyphen).  You can even analyze "IceCream > Ice Cream" (catenate on case change).  But how is your analyzer going to know that "icecream" should index as two tokens: "ice" "cream" ?  You're asking analysis to do too much in this case.  This is where spellcheck can bridge the gap.

Of course, if you have a discrete list of words you want split like this, then you can do it with analysis using index-time synonyms.  In this case, you need to provide it with the list.  See https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory for more information.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: sunshine glass [mailto:sunshineglassof2day@gmail.com] 
Sent: Thursday, July 31, 2014 10:32 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching words with spaces for word without spaces in solr

I am not clear with this. This link is related to spell check. Can you
elaborate it more ?


On Wed, Jul 30, 2014 at 9:17 PM, Dyer, James <Ja...@ingramcontent.com>
wrote:

> In addition to the analyzer configuration you're using, you might want to
> also use WordBreakSolrSpellChecker to catch possible matches that can't
> easily be solved through analysis.  For more information, see the section
> for it at https://cwiki.apache.org/confluence/display/solr/Spell+Checking
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
> -----Original Message-----
> From: sunshine glass [mailto:sunshineglassof2day@gmail.com]
> Sent: Wednesday, July 30, 2014 9:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Searching words with spaces for word without spaces in solr
>
> This is the new configuration:
>
>     <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >       <analyzer type="index">
> >         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> > outputUnigrams="true" tokenSeparator=""/>
> >         <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/>
> >           <filter class="solr.SynonymFilterFactory"
> > synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> > expand="true"/>
> >       </analyzer>
> >       <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords_text_prime_search.txt" enablePositionIncrements="true"
> />
> >         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> > outputUnigrams="true" tokenSeparator=""/>
> >         <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
> >         <filter class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/>
> >       </fieldType>
> >
> >
> These are current docs in my index:
>
> <result name="response" numFound="3" start="0">
> <doc>
> <str name="id">2</str>
> <str name="title">Icecream</str>
> <long name="_version_">1475063961342705664</long>
> </doc>
> <doc>
> <str name="id">3</str>
> <str name="title">Ice-cream</str>
> <long name="_version_">1475063961344802816</long>
> </doc>
> <doc>
> <str name="id">1</str>
> <str name="title">Ice Cream</str>
> <long name="_version_">1475063961203245056</long>
> </doc>
> </result>
> </response>
>
> Query:
> http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true
>
> Response:
>
> <result name="response" numFound="2" start="0">
> <doc>
> <str name="id">1</str>
> <str name="title">Ice Cream</str>
> <long name="_version_">1475063961203245056</long>
> </doc>
> <doc>
> <str name="id">3</str>
> <str name="title">Ice-cream</str>
> <long name="_version_">1475063961344802816</long>
> </doc>
> </result>
> <lst name="debug">
> <str name="rawquerystring">title:ice cream</str>
> <str name="querystring">title:ice cream</str>
> <str name="parsedquery">
> (+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord
> </str>
> <str name="parsedquery_toString">+(title:ice (title:cream))</str>
> <lst name="explain">
> <str name="1">
> 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
> [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
> termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
> in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
> termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
> 0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
> 0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
> tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
> maxDocs=3) 0.4375 = fieldNorm(doc=0)
> </str>
> <str name="3">
> 0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
> [DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
> termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
> product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
> idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
> weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
> score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq
> of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
> fieldNorm(doc=2)
> </str>
> </lst>
>
> Still not working ????
>
>
> On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > I'd spend some time with the admin/analysis page to understand the exact
> > tokenization going on here. For instance, sequencing the
> > shinglefilterfactory before worddelimiterfilterfactory may produce
> > "interesting" resutls. And then throwing the Snowball factory at it and
> > putting synonyms in front.... I suspect you're not indexing or searching
> > what you think you are.
> >
> > Second, what happens when you query with &debug=query? That'll show you
> > what the search string looks like.
> >
> > If that doesn't help, please post the results of looking at those things
> > here, that'll provide some information for us to work with.
> >
> > Best,
> > Erick
> >
> >
> > On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
> > sunshineglassof2day@gmail.com> wrote:
> >
> > > Hi Folks,
> > >
> > > Any updates ??
> > >
> > >
> > > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
> > > sunshineglassof2day@gmail.com> wrote:
> > >
> > > > Dear Team,
> > > >
> > > > How can I handle compound word searches in solr ?.
> > > > How can i search "hand bag" if I have "handbag" in my index. While
> > using
> > > > shingle in query analyzer, the query "ice cube" creates three tokens
> as
> > > > "ice","cube", "icecube". Only ice and cubes are searched but not
> > > > "icecubes".i.e not working for pair though I am using shingle filter.
> > > >
> > > > Here's the schema config.
> > > >
> > > >
> > > >    1.  <fieldType name="text" class="solr.TextField"
> > > >    positionIncrementGap="100">
> > > >    2.       <analyzer type="index">
> > > >    3.         <filter class="solr.SynonymFilterFactory"
> > > >    synonyms="synonyms_text_prime_index.txt" ignoreCase="true"
> > > expand="true"/>
> > > >    4.         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > > >    5.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > >    6.          <filter class="solr.ShingleFilterFactory"
> > > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > > >    7.          <filter class="solr.WordDelimiterFilterFactory"
> > > >    catenateWords="1" catenateNumbers="1" catenateAll="1"
> > > preserveOriginal="1"
> > > >    generateWordParts="1" generateNumberParts="1"/>
> > > >    8.         <filter class="solr.LowerCaseFilterFactory"/>
> > > >    9.         <filter class="solr.SnowballPorterFilterFactory"
> > > >    language="English" protected="protwords.txt"/>
> > > >    10.       </analyzer>
> > > >    11.       <analyzer type="query">
> > > >    12.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > >    13.         <filter class="solr.SynonymFilterFactory"
> > > >    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> > > >    14.         <filter class="solr.ShingleFilterFactory"
> > > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > > >    15.         <filter class="solr.WordDelimiterFilterFactory"
> > > >    preserveOriginal="1"/>
> > > >    16.         <filter class="solr.LowerCaseFilterFactory"/>
> > > >    17.         <filter class="solr.SnowballPorterFilterFactory"
> > > >    language="English" protected="protwords.txt"/>
> > > >    18.       </analyzer>
> > > >    19.     </fieldType>
> > > >
> > > >    Any help is appreciated.
> > > >
> > > >
> > >
> >
>

Re: Searching words with spaces for word without spaces in solr

Posted by sunshine glass <su...@gmail.com>.

I am not clear with this. This link is related to spell check. Can you
elaborate it more ?


On Wed, Jul 30, 2014 at 9:17 PM, Dyer, James <Ja...@ingramcontent.com>
wrote:

> In addition to the analyzer configuration you're using, you might want to
> also use WordBreakSolrSpellChecker to catch possible matches that can't
> easily be solved through analysis.  For more information, see the section
> for it at https://cwiki.apache.org/confluence/display/solr/Spell+Checking
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
> -----Original Message-----
> From: sunshine glass [mailto:sunshineglassof2day@gmail.com]
> Sent: Wednesday, July 30, 2014 9:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Searching words with spaces for word without spaces in solr
>
> This is the new configuration:
>
>     <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >       <analyzer type="index">
> >         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> > outputUnigrams="true" tokenSeparator=""/>
> >         <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/>
> >           <filter class="solr.SynonymFilterFactory"
> > synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> > expand="true"/>
> >       </analyzer>
> >       <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords_text_prime_search.txt" enablePositionIncrements="true"
> />
> >         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> > outputUnigrams="true" tokenSeparator=""/>
> >         <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
> >         <filter class="solr.SnowballPorterFilterFactory"
> > language="English" protected="protwords.txt"/>
> >       </fieldType>
> >
> >
> These are current docs in my index:
>
> <result name="response" numFound="3" start="0">
> <doc>
> <str name="id">2</str>
> <str name="title">Icecream</str>
> <long name="_version_">1475063961342705664</long>
> </doc>
> <doc>
> <str name="id">3</str>
> <str name="title">Ice-cream</str>
> <long name="_version_">1475063961344802816</long>
> </doc>
> <doc>
> <str name="id">1</str>
> <str name="title">Ice Cream</str>
> <long name="_version_">1475063961203245056</long>
> </doc>
> </result>
> </response>
>
> Query:
> http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true
>
> Response:
>
> <result name="response" numFound="2" start="0">
> <doc>
> <str name="id">1</str>
> <str name="title">Ice Cream</str>
> <long name="_version_">1475063961203245056</long>
> </doc>
> <doc>
> <str name="id">3</str>
> <str name="title">Ice-cream</str>
> <long name="_version_">1475063961344802816</long>
> </doc>
> </result>
> <lst name="debug">
> <str name="rawquerystring">title:ice cream</str>
> <str name="querystring">title:ice cream</str>
> <str name="parsedquery">
> (+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord
> </str>
> <str name="parsedquery_toString">+(title:ice (title:cream))</str>
> <lst name="explain">
> <str name="1">
> 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
> [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
> termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
> in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
> termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
> 0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
> 0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
> tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
> maxDocs=3) 0.4375 = fieldNorm(doc=0)
> </str>
> <str name="3">
> 0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
> [DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
> termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
> product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
> idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
> weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
> score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq
> of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
> fieldNorm(doc=2)
> </str>
> </lst>
>
> Still not working ????
>
>
> On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > I'd spend some time with the admin/analysis page to understand the exact
> > tokenization going on here. For instance, sequencing the
> > shinglefilterfactory before worddelimiterfilterfactory may produce
> > "interesting" resutls. And then throwing the Snowball factory at it and
> > putting synonyms in front.... I suspect you're not indexing or searching
> > what you think you are.
> >
> > Second, what happens when you query with &debug=query? That'll show you
> > what the search string looks like.
> >
> > If that doesn't help, please post the results of looking at those things
> > here, that'll provide some information for us to work with.
> >
> > Best,
> > Erick
> >
> >
> > On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
> > sunshineglassof2day@gmail.com> wrote:
> >
> > > Hi Folks,
> > >
> > > Any updates ??
> > >
> > >
> > > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
> > > sunshineglassof2day@gmail.com> wrote:
> > >
> > > > Dear Team,
> > > >
> > > > How can I handle compound word searches in solr ?.
> > > > How can i search "hand bag" if I have "handbag" in my index. While
> > using
> > > > shingle in query analyzer, the query "ice cube" creates three tokens
> as
> > > > "ice","cube", "icecube". Only ice and cubes are searched but not
> > > > "icecubes".i.e not working for pair though I am using shingle filter.
> > > >
> > > > Here's the schema config.
> > > >
> > > >
> > > >    1.  <fieldType name="text" class="solr.TextField"
> > > >    positionIncrementGap="100">
> > > >    2.       <analyzer type="index">
> > > >    3.         <filter class="solr.SynonymFilterFactory"
> > > >    synonyms="synonyms_text_prime_index.txt" ignoreCase="true"
> > > expand="true"/>
> > > >    4.         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > > >    5.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > >    6.          <filter class="solr.ShingleFilterFactory"
> > > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > > >    7.          <filter class="solr.WordDelimiterFilterFactory"
> > > >    catenateWords="1" catenateNumbers="1" catenateAll="1"
> > > preserveOriginal="1"
> > > >    generateWordParts="1" generateNumberParts="1"/>
> > > >    8.         <filter class="solr.LowerCaseFilterFactory"/>
> > > >    9.         <filter class="solr.SnowballPorterFilterFactory"
> > > >    language="English" protected="protwords.txt"/>
> > > >    10.       </analyzer>
> > > >    11.       <analyzer type="query">
> > > >    12.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > >    13.         <filter class="solr.SynonymFilterFactory"
> > > >    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> > > >    14.         <filter class="solr.ShingleFilterFactory"
> > > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > > >    15.         <filter class="solr.WordDelimiterFilterFactory"
> > > >    preserveOriginal="1"/>
> > > >    16.         <filter class="solr.LowerCaseFilterFactory"/>
> > > >    17.         <filter class="solr.SnowballPorterFilterFactory"
> > > >    language="English" protected="protwords.txt"/>
> > > >    18.       </analyzer>
> > > >    19.     </fieldType>
> > > >
> > > >    Any help is appreciated.
> > > >
> > > >
> > >
> >
>

RE: Searching words with spaces for word without spaces in solr

Posted by "Dyer, James" <Ja...@ingramcontent.com>.

In addition to the analyzer configuration you're using, you might want to also use WordBreakSolrSpellChecker to catch possible matches that can't easily be solved through analysis.  For more information, see the section for it at https://cwiki.apache.org/confluence/display/solr/Spell+Checking 

James Dyer
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: sunshine glass [mailto:sunshineglassof2day@gmail.com] 
Sent: Wednesday, July 30, 2014 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching words with spaces for word without spaces in solr

This is the new configuration:

    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true" tokenSeparator=""/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>           <filter class="solr.SynonymFilterFactory"
> synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> expand="true"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true" tokenSeparator=""/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>       </fieldType>
>
>
These are current docs in my index:

<result name="response" numFound="3" start="0">
<doc>
<str name="id">2</str>
<str name="title">Icecream</str>
<long name="_version_">1475063961342705664</long>
</doc>
<doc>
<str name="id">3</str>
<str name="title">Ice-cream</str>
<long name="_version_">1475063961344802816</long>
</doc>
<doc>
<str name="id">1</str>
<str name="title">Ice Cream</str>
<long name="_version_">1475063961203245056</long>
</doc>
</result>
</response>

Query:
http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true

Response:

<result name="response" numFound="2" start="0">
<doc>
<str name="id">1</str>
<str name="title">Ice Cream</str>
<long name="_version_">1475063961203245056</long>
</doc>
<doc>
<str name="id">3</str>
<str name="title">Ice-cream</str>
<long name="_version_">1475063961344802816</long>
</doc>
</result>
<lst name="debug">
<str name="rawquerystring">title:ice cream</str>
<str name="querystring">title:ice cream</str>
<str name="parsedquery">
(+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord
</str>
<str name="parsedquery_toString">+(title:ice (title:cream))</str>
<lst name="explain">
<str name="1">
0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
[DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
maxDocs=3) 0.4375 = fieldNorm(doc=0)
</str>
<str name="3">
0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
[DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq
of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
fieldNorm(doc=2)
</str>
</lst>

Still not working ????


On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <er...@gmail.com>
wrote:

> I'd spend some time with the admin/analysis page to understand the exact
> tokenization going on here. For instance, sequencing the
> shinglefilterfactory before worddelimiterfilterfactory may produce
> "interesting" resutls. And then throwing the Snowball factory at it and
> putting synonyms in front.... I suspect you're not indexing or searching
> what you think you are.
>
> Second, what happens when you query with &debug=query? That'll show you
> what the search string looks like.
>
> If that doesn't help, please post the results of looking at those things
> here, that'll provide some information for us to work with.
>
> Best,
> Erick
>
>
> On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
> sunshineglassof2day@gmail.com> wrote:
>
> > Hi Folks,
> >
> > Any updates ??
> >
> >
> > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
> > sunshineglassof2day@gmail.com> wrote:
> >
> > > Dear Team,
> > >
> > > How can I handle compound word searches in solr ?.
> > > How can i search "hand bag" if I have "handbag" in my index. While
> using
> > > shingle in query analyzer, the query "ice cube" creates three tokens as
> > > "ice","cube", "icecube". Only ice and cubes are searched but not
> > > "icecubes".i.e not working for pair though I am using shingle filter.
> > >
> > > Here's the schema config.
> > >
> > >
> > >    1.  <fieldType name="text" class="solr.TextField"
> > >    positionIncrementGap="100">
> > >    2.       <analyzer type="index">
> > >    3.         <filter class="solr.SynonymFilterFactory"
> > >    synonyms="synonyms_text_prime_index.txt" ignoreCase="true"
> > expand="true"/>
> > >    4.         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > >    5.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >    6.          <filter class="solr.ShingleFilterFactory"
> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > >    7.          <filter class="solr.WordDelimiterFilterFactory"
> > >    catenateWords="1" catenateNumbers="1" catenateAll="1"
> > preserveOriginal="1"
> > >    generateWordParts="1" generateNumberParts="1"/>
> > >    8.         <filter class="solr.LowerCaseFilterFactory"/>
> > >    9.         <filter class="solr.SnowballPorterFilterFactory"
> > >    language="English" protected="protwords.txt"/>
> > >    10.       </analyzer>
> > >    11.       <analyzer type="query">
> > >    12.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >    13.         <filter class="solr.SynonymFilterFactory"
> > >    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> > >    14.         <filter class="solr.ShingleFilterFactory"
> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > >    15.         <filter class="solr.WordDelimiterFilterFactory"
> > >    preserveOriginal="1"/>
> > >    16.         <filter class="solr.LowerCaseFilterFactory"/>
> > >    17.         <filter class="solr.SnowballPorterFilterFactory"
> > >    language="English" protected="protwords.txt"/>
> > >    18.       </analyzer>
> > >    19.     </fieldType>
> > >
> > >    Any help is appreciated.
> > >
> > >
> >
>

Re: Searching words with spaces for word without spaces in solr

Posted by sunshine glass <su...@gmail.com>.

This is the analysis page:



Please help me now.


On Wed, Jul 30, 2014 at 8:08 PM, sunshine glass <
sunshineglassof2day@gmail.com> wrote:

> This is the new configuration:
>
>     <fieldType name="text" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer type="index">
>>
>>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>         <tokenizer class="solr.StandardTokenizerFactory"/>
>>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
>> outputUnigrams="true" tokenSeparator=""/>
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>           <filter class="solr.SynonymFilterFactory"
>> synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
>> expand="true"/>
>>
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.StandardTokenizerFactory"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />
>>
>>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
>> outputUnigrams="true" tokenSeparator=""/>
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>       </fieldType>
>>
>>
> These are current docs in my index:
>
> <result name="response" numFound="3" start="0">
> <doc>
> <str name="id">2</str>
> <str name="title">Icecream</str>
> <long name="_version_">1475063961342705664</long>
> </doc>
> <doc>
> <str name="id">3</str>
> <str name="title">Ice-cream</str>
> <long name="_version_">1475063961344802816</long>
> </doc>
> <doc>
> <str name="id">1</str>
> <str name="title">Ice Cream</str>
> <long name="_version_">1475063961203245056</long>
> </doc>
> </result>
> </response>
>
> Query:
> http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true
>
> Response:
>
> <result name="response" numFound="2" start="0">
> <doc>
> <str name="id">1</str>
> <str name="title">Ice Cream</str>
> <long name="_version_">1475063961203245056</long>
> </doc>
> <doc>
> <str name="id">3</str>
> <str name="title">Ice-cream</str>
> <long name="_version_">1475063961344802816</long>
> </doc>
> </result>
> <lst name="debug">
> <str name="rawquerystring">title:ice cream</str>
> <str name="querystring">title:ice cream</str>
> <str name="parsedquery">
> (+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord
> </str>
> <str name="parsedquery_toString">+(title:ice (title:cream))</str>
> <lst name="explain">
> <str name="1">
> 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
> [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
> termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
> in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
> termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
> 0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
> 0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
> tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
> maxDocs=3) 0.4375 = fieldNorm(doc=0)
> </str>
> <str name="3">
> 0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
> [DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
> termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
> product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
> idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
> weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
> score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq
> of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
> fieldNorm(doc=2)
> </str>
> </lst>
>
> Still not working ????
>
>
> On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> I'd spend some time with the admin/analysis page to understand the exact
>> tokenization going on here. For instance, sequencing the
>> shinglefilterfactory before worddelimiterfilterfactory may produce
>> "interesting" resutls. And then throwing the Snowball factory at it and
>> putting synonyms in front.... I suspect you're not indexing or searching
>> what you think you are.
>>
>> Second, what happens when you query with &debug=query? That'll show you
>> what the search string looks like.
>>
>> If that doesn't help, please post the results of looking at those things
>> here, that'll provide some information for us to work with.
>>
>> Best,
>> Erick
>>
>>
>> On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
>> sunshineglassof2day@gmail.com> wrote:
>>
>> > Hi Folks,
>> >
>> > Any updates ??
>> >
>> >
>> > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
>> > sunshineglassof2day@gmail.com> wrote:
>> >
>> > > Dear Team,
>> > >
>> > > How can I handle compound word searches in solr ?.
>> > > How can i search "hand bag" if I have "handbag" in my index. While
>> using
>> > > shingle in query analyzer, the query "ice cube" creates three tokens
>> as
>> > > "ice","cube", "icecube". Only ice and cubes are searched but not
>> > > "icecubes".i.e not working for pair though I am using shingle filter.
>> > >
>> > > Here's the schema config.
>> > >
>> > >
>> > >    1.  <fieldType name="text" class="solr.TextField"
>> > >    positionIncrementGap="100">
>> > >    2.       <analyzer type="index">
>> > >    3.         <filter class="solr.SynonymFilterFactory"
>> > >    synonyms="synonyms_text_prime_index.txt" ignoreCase="true"
>> > expand="true"/>
>> > >    4.         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>> > >    5.         <tokenizer class="solr.StandardTokenizerFactory"/>
>> > >    6.          <filter class="solr.ShingleFilterFactory"
>> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
>> > >    7.          <filter class="solr.WordDelimiterFilterFactory"
>> > >    catenateWords="1" catenateNumbers="1" catenateAll="1"
>> > preserveOriginal="1"
>> > >    generateWordParts="1" generateNumberParts="1"/>
>> > >    8.         <filter class="solr.LowerCaseFilterFactory"/>
>> > >    9.         <filter class="solr.SnowballPorterFilterFactory"
>> > >    language="English" protected="protwords.txt"/>
>> > >    10.       </analyzer>
>> > >    11.       <analyzer type="query">
>> > >    12.         <tokenizer class="solr.StandardTokenizerFactory"/>
>> > >    13.         <filter class="solr.SynonymFilterFactory"
>> > >    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>> > >    14.         <filter class="solr.ShingleFilterFactory"
>> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
>> > >    15.         <filter class="solr.WordDelimiterFilterFactory"
>> > >    preserveOriginal="1"/>
>> > >    16.         <filter class="solr.LowerCaseFilterFactory"/>
>> > >    17.         <filter class="solr.SnowballPorterFilterFactory"
>> > >    language="English" protected="protwords.txt"/>
>> > >    18.       </analyzer>
>> > >    19.     </fieldType>
>> > >
>> > >    Any help is appreciated.
>> > >
>> > >
>> >
>>
>
>