You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Rohan Thakur <ro...@gmail.com> on 2013/03/15 07:52:25 UTC

had query regarding the indexing and analysers

hi all

wanted to know I have this string in field title :

Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD

I have indexed it using text-en-splliting-tight


and now I am searching for term like q=dual core

but in the relevance part its this title is coming down the order as
solr is not searching dual in this string its just searching core term
from the query in this string thus multiplying the score for this field by 1/2
decreasing the score.

how can I correct this can any one help

thanks
regards
Rohan

Re: had query regarding the indexing and analysers

Posted by Jack Krupansky <ja...@basetechnology.com>.

Yes, if there is only a single analyzer or an index analyzer is specified 
and the Porter stemmer is used in it.

-- Jack Krupansky

-----Original Message----- 
From: Rohan Thakur
Sent: Monday, April 01, 2013 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: had query regarding the indexing and analysers

hi

does this means that while indexing also ace is been stored as ac in solr
index?

thanks
regards
Rohan

On Fri, Mar 22, 2013 at 9:49 AM, Jack Krupansky 
<ja...@basetechnology.com>wrote:

> Actually, it's the Porter Stemmer that is turning "ace" into "ac".
>
> Try making a copy of text_en_splitting and delete the
> PorterStemFilterFactory filter from both the query and index analyzers.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rohan Thakur
> Sent: Wednesday, March 20, 2013 8:39 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: had query regarding the indexing and analysers
>
> hi jack
>
> I have been using text_en_splitting initially but what it was doing is it
> is changing by query aswell
> for example:
> if i am searching for "ace" term it is taking it as "ac" thus giving split
> ac higher score...
> see debug statment:
>
> "debug":{
>    "rawquerystring":"ace",
>    "querystring":"ace",
>    "parsedquery":"(+**DisjunctionMaxQuery((title:ac^**30.0)))/no_coord",
>    "parsedquery_toString":"+(**title:ac^30.0)",
>    "explain":{
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 469,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=469)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 470,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=470)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 471,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=471)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 472,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=472)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 331,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=331)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 332,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=332)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 335,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=335)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 336,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=336)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 337,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=337)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 393,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=393)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 425,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=425)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 426,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=426)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 429,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=429)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 430)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 430,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=430)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 431)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 431,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=431)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 433)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 433,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=433)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 434)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 434,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=434)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 502)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 502,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=502)\n",
>      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 411)
> [DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 411,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.3125 = fieldNorm(doc=411)\n",
>      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 424)
> [DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 424,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.3125 = fieldNorm(doc=424)\n"},
>    "QParser":"**ExtendedDismaxQParser",
>
>
>
> On Tue, Mar 19, 2013 at 7:37 PM, Jack Krupansky <ja...@basetechnology.com>*
> *wrote:
>
>  Yeah, one ambiguity in typography is whether a hyphen is internal to a
>> compound term (e.g., "CD-ROM") or a phrase separator as in your case. 
>> Some
>> people are careful to put spaces around the hyphen for a phrase 
>> delimiter,
>> but plenty of people still just drop it in directly adjacent to two 
>> words.
>>
>> In your case, text_en_splitting_tight is SPECIFICALLY trying to keep
>> "Laptop-DUAL" together as a single term, so that "wi fi" is kept distinct
>> from "Wi-Fi".
>>
>> Try text_en_splitting, which specifically is NOT trying to keep them
>> together.
>>
>> The key clue here is that the former does not have generateWordParts="1".
>> That is the option that is needed so that "Laptop-DUAL" will be indexed 
>> as
>> "laptop dual".
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Rohan Thakur
>> Sent: Tuesday, March 19, 2013 3:35 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: had query regarding the indexing and analysers
>>
>>
>> my default is title only I have used debug as well it shows that solr
>> divides the query into dual and core and then searches both separately 
>> now
>> while calculating the scores it puts the document in which both the terms
>> appear and in my case the document containing this title:
>>
>> Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>>
>> solr has found only core term not dual as I guess it is
>> attached to laptop term not as even searching for only dual
>> term this document doesnot show up which is why this document
>> sshows down in the search results thus I am not able to
>> search for partial terms for that I have to apply *dual
>> in the query then it is searching this document but then
>> other search scoring gets affected with this when I put * in
>> the query terms I think I have to remove the "-" terms from
>> the strings before indexing them point me if i am wrong any
>> where
>>
>> thanks
>> regards
>> Rohan
>>
>>
>> On Sat, Mar 16, 2013 at 7:02 PM, Erick Erickson <erickerickson@gmail.com
>> >*
>> *wrote:
>>
>>
>>  See admin/analysis, it's invaluable. Probably
>>
>>>
>>> The terms are being searched against your default text field which I'd
>>> guess is not "title".
>>>
>>> Also, try adding &debug=all to your query and look in the debug info at
>>> the
>>> parsed form of the query to see what's actually being searched.
>>>
>>> Best
>>> Erick
>>>
>>>
>>> On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <ro...@gmail.com>
>>> wrote:
>>>
>>> > hi all
>>> >
>>> > wanted to know I have this string in field title :
>>> >
>>> > Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>>> >
>>> > I have indexed it using text-en-splliting-tight
>>> >
>>> >
>>> > and now I am searching for term like q=dual core
>>> >
>>> > but in the relevance part its this title is coming down the order as
>>> > solr is not searching dual in this string its just searching core term
>>> > from the query in this string thus multiplying the score for this 
>>> > field
>>> by
>>> > 1/2
>>> > decreasing the score.
>>> >
>>> > how can I correct this can any one help
>>> >
>>> > thanks
>>> > regards
>>> > Rohan
>>> >
>>>
>>>
>>>
>>
>

Re: had query regarding the indexing and analysers

Posted by Rohan Thakur <ro...@gmail.com>.

hi

does this means that while indexing also ace is been stored as ac in solr
index?

thanks
regards
Rohan

On Fri, Mar 22, 2013 at 9:49 AM, Jack Krupansky <ja...@basetechnology.com>wrote:

> Actually, it's the Porter Stemmer that is turning "ace" into "ac".
>
> Try making a copy of text_en_splitting and delete the
> PorterStemFilterFactory filter from both the query and index analyzers.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rohan Thakur
> Sent: Wednesday, March 20, 2013 8:39 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: had query regarding the indexing and analysers
>
> hi jack
>
> I have been using text_en_splitting initially but what it was doing is it
> is changing by query aswell
> for example:
> if i am searching for "ace" term it is taking it as "ac" thus giving split
> ac higher score...
> see debug statment:
>
> "debug":{
>    "rawquerystring":"ace",
>    "querystring":"ace",
>    "parsedquery":"(+**DisjunctionMaxQuery((title:ac^**30.0)))/no_coord",
>    "parsedquery_toString":"+(**title:ac^30.0)",
>    "explain":{
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 469,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=469)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 470,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=470)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 471,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=471)\n",
>      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472)
> [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 472,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.4375 = fieldNorm(doc=472)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 331,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=331)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 332,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=332)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 335,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=335)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 336,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=336)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 337,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=337)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 393,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=393)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 425,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=425)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 426,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=426)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 429,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=429)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 430)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 430,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=430)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 431)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 431,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=431)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 433)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 433,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=433)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 434)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 434,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=434)\n",
>      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 502)
> [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 502,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
> = fieldNorm(doc=502)\n",
>      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 411)
> [DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 411,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.3125 = fieldNorm(doc=411)\n",
>      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 424)
> [DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 424,
> product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
> termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
> 0.3125 = fieldNorm(doc=424)\n"},
>    "QParser":"**ExtendedDismaxQParser",
>
>
>
> On Tue, Mar 19, 2013 at 7:37 PM, Jack Krupansky <ja...@basetechnology.com>*
> *wrote:
>
>  Yeah, one ambiguity in typography is whether a hyphen is internal to a
>> compound term (e.g., "CD-ROM") or a phrase separator as in your case. Some
>> people are careful to put spaces around the hyphen for a phrase delimiter,
>> but plenty of people still just drop it in directly adjacent to two words.
>>
>> In your case, text_en_splitting_tight is SPECIFICALLY trying to keep
>> "Laptop-DUAL" together as a single term, so that "wi fi" is kept distinct
>> from "Wi-Fi".
>>
>> Try text_en_splitting, which specifically is NOT trying to keep them
>> together.
>>
>> The key clue here is that the former does not have generateWordParts="1".
>> That is the option that is needed so that "Laptop-DUAL" will be indexed as
>> "laptop dual".
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Rohan Thakur
>> Sent: Tuesday, March 19, 2013 3:35 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: had query regarding the indexing and analysers
>>
>>
>> my default is title only I have used debug as well it shows that solr
>> divides the query into dual and core and then searches both separately now
>> while calculating the scores it puts the document in which both the terms
>> appear and in my case the document containing this title:
>>
>> Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>>
>> solr has found only core term not dual as I guess it is
>> attached to laptop term not as even searching for only dual
>> term this document doesnot show up which is why this document
>> sshows down in the search results thus I am not able to
>> search for partial terms for that I have to apply *dual
>> in the query then it is searching this document but then
>> other search scoring gets affected with this when I put * in
>> the query terms I think I have to remove the "-" terms from
>> the strings before indexing them point me if i am wrong any
>> where
>>
>> thanks
>> regards
>> Rohan
>>
>>
>> On Sat, Mar 16, 2013 at 7:02 PM, Erick Erickson <erickerickson@gmail.com
>> >*
>> *wrote:
>>
>>
>>  See admin/analysis, it's invaluable. Probably
>>
>>>
>>> The terms are being searched against your default text field which I'd
>>> guess is not "title".
>>>
>>> Also, try adding &debug=all to your query and look in the debug info at
>>> the
>>> parsed form of the query to see what's actually being searched.
>>>
>>> Best
>>> Erick
>>>
>>>
>>> On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <ro...@gmail.com>
>>> wrote:
>>>
>>> > hi all
>>> >
>>> > wanted to know I have this string in field title :
>>> >
>>> > Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>>> >
>>> > I have indexed it using text-en-splliting-tight
>>> >
>>> >
>>> > and now I am searching for term like q=dual core
>>> >
>>> > but in the relevance part its this title is coming down the order as
>>> > solr is not searching dual in this string its just searching core term
>>> > from the query in this string thus multiplying the score for this field
>>> by
>>> > 1/2
>>> > decreasing the score.
>>> >
>>> > how can I correct this can any one help
>>> >
>>> > thanks
>>> > regards
>>> > Rohan
>>> >
>>>
>>>
>>>
>>
>

Re: had query regarding the indexing and analysers

Posted by Jack Krupansky <ja...@basetechnology.com>.

Actually, it's the Porter Stemmer that is turning "ace" into "ac".

Try making a copy of text_en_splitting and delete the 
PorterStemFilterFactory filter from both the query and index analyzers.

-- Jack Krupansky

-----Original Message----- 
From: Rohan Thakur
Sent: Wednesday, March 20, 2013 8:39 AM
To: solr-user@lucene.apache.org
Subject: Re: had query regarding the indexing and analysers

hi jack

I have been using text_en_splitting initially but what it was doing is it
is changing by query aswell
for example:
if i am searching for "ace" term it is taking it as "ac" thus giving split
ac higher score...
see debug statment:

"debug":{
    "rawquerystring":"ace",
    "querystring":"ace",
    "parsedquery":"(+DisjunctionMaxQuery((title:ac^30.0)))/no_coord",
    "parsedquery_toString":"+(title:ac^30.0)",
    "explain":{
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 469,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=469)\n",
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 470,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=470)\n",
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 471,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=471)\n",
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 472,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=472)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 331,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=331)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 332,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=332)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 335,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=335)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 336,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=336)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 337,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=337)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 393,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=393)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 425,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=425)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 426,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=426)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 429,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=429)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 430)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 430,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=430)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 431)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 431,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=431)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 433)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 433,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=433)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 434)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 434,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=434)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 502)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 502,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=502)\n",
      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 411)
[DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 411,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.3125 = fieldNorm(doc=411)\n",
      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 424)
[DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 424,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.3125 = fieldNorm(doc=424)\n"},
    "QParser":"ExtendedDismaxQParser",



On Tue, Mar 19, 2013 at 7:37 PM, Jack Krupansky 
<ja...@basetechnology.com>wrote:

> Yeah, one ambiguity in typography is whether a hyphen is internal to a
> compound term (e.g., "CD-ROM") or a phrase separator as in your case. Some
> people are careful to put spaces around the hyphen for a phrase delimiter,
> but plenty of people still just drop it in directly adjacent to two words.
>
> In your case, text_en_splitting_tight is SPECIFICALLY trying to keep
> "Laptop-DUAL" together as a single term, so that "wi fi" is kept distinct
> from "Wi-Fi".
>
> Try text_en_splitting, which specifically is NOT trying to keep them
> together.
>
> The key clue here is that the former does not have generateWordParts="1".
> That is the option that is needed so that "Laptop-DUAL" will be indexed as
> "laptop dual".
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rohan Thakur
> Sent: Tuesday, March 19, 2013 3:35 AM
> To: solr-user@lucene.apache.org
> Subject: Re: had query regarding the indexing and analysers
>
>
> my default is title only I have used debug as well it shows that solr
> divides the query into dual and core and then searches both separately now
> while calculating the scores it puts the document in which both the terms
> appear and in my case the document containing this title:
>
> Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>
> solr has found only core term not dual as I guess it is
> attached to laptop term not as even searching for only dual
> term this document doesnot show up which is why this document
> sshows down in the search results thus I am not able to
> search for partial terms for that I have to apply *dual
> in the query then it is searching this document but then
> other search scoring gets affected with this when I put * in
> the query terms I think I have to remove the "-" terms from
> the strings before indexing them point me if i am wrong any
> where
>
> thanks
> regards
> Rohan
>
>
> On Sat, Mar 16, 2013 at 7:02 PM, Erick Erickson <er...@gmail.com>*
> *wrote:
>
>  See admin/analysis, it's invaluable. Probably
>>
>> The terms are being searched against your default text field which I'd
>> guess is not "title".
>>
>> Also, try adding &debug=all to your query and look in the debug info at
>> the
>> parsed form of the query to see what's actually being searched.
>>
>> Best
>> Erick
>>
>>
>> On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <ro...@gmail.com>
>> wrote:
>>
>> > hi all
>> >
>> > wanted to know I have this string in field title :
>> >
>> > Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>> >
>> > I have indexed it using text-en-splliting-tight
>> >
>> >
>> > and now I am searching for term like q=dual core
>> >
>> > but in the relevance part its this title is coming down the order as
>> > solr is not searching dual in this string its just searching core term
>> > from the query in this string thus multiplying the score for this field
>> by
>> > 1/2
>> > decreasing the score.
>> >
>> > how can I correct this can any one help
>> >
>> > thanks
>> > regards
>> > Rohan
>> >
>>
>>
>

Re: had query regarding the indexing and analysers

Posted by Rohan Thakur <ro...@gmail.com>.

hi jack

I have been using text_en_splitting initially but what it was doing is it
is changing by query aswell
for example:
if i am searching for "ace" term it is taking it as "ac" thus giving split
ac higher score...
see debug statment:

"debug":{
    "rawquerystring":"ace",
    "querystring":"ace",
    "parsedquery":"(+DisjunctionMaxQuery((title:ac^30.0)))/no_coord",
    "parsedquery_toString":"+(title:ac^30.0)",
    "explain":{
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 469,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=469)\n",
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 470,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=470)\n",
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 471,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=471)\n",
      "":"\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 472,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=472)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 331,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=331)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 332,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=332)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 335,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=335)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 336,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=336)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 337,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=337)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 393,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=393)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 425,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=425)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 426,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=426)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 429,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=429)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 430)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 430,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=430)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 431)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 431,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=431)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 433)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 433,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=433)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 434)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 434,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=434)\n",
      "":"\n1.5985848 = (MATCH) weight(title:ac^30.0 in 502)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 502,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n    0.375
= fieldNorm(doc=502)\n",
      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 411)
[DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 411,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.3125 = fieldNorm(doc=411)\n",
      "":"\n1.332154 = (MATCH) weight(title:ac^30.0 in 424)
[DefaultSimilarity], result of:\n  1.332154 = fieldWeight in 424,
product of:\n    1.0 = tf(freq=1.0), with freq of:\n      1.0 =
termFreq=1.0\n    4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.3125 = fieldNorm(doc=424)\n"},
    "QParser":"ExtendedDismaxQParser",



On Tue, Mar 19, 2013 at 7:37 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> Yeah, one ambiguity in typography is whether a hyphen is internal to a
> compound term (e.g., "CD-ROM") or a phrase separator as in your case. Some
> people are careful to put spaces around the hyphen for a phrase delimiter,
> but plenty of people still just drop it in directly adjacent to two words.
>
> In your case, text_en_splitting_tight is SPECIFICALLY trying to keep
> "Laptop-DUAL" together as a single term, so that "wi fi" is kept distinct
> from "Wi-Fi".
>
> Try text_en_splitting, which specifically is NOT trying to keep them
> together.
>
> The key clue here is that the former does not have generateWordParts="1".
> That is the option that is needed so that "Laptop-DUAL" will be indexed as
> "laptop dual".
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rohan Thakur
> Sent: Tuesday, March 19, 2013 3:35 AM
> To: solr-user@lucene.apache.org
> Subject: Re: had query regarding the indexing and analysers
>
>
> my default is title only I have used debug as well it shows that solr
> divides the query into dual and core and then searches both separately now
> while calculating the scores it puts the document in which both the terms
> appear and in my case the document containing this title:
>
> Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>
> solr has found only core term not dual as I guess it is
> attached to laptop term not as even searching for only dual
> term this document doesnot show up which is why this document
> sshows down in the search results thus I am not able to
> search for partial terms for that I have to apply *dual
> in the query then it is searching this document but then
> other search scoring gets affected with this when I put * in
> the query terms I think I have to remove the "-" terms from
> the strings before indexing them point me if i am wrong any
> where
>
> thanks
> regards
> Rohan
>
>
> On Sat, Mar 16, 2013 at 7:02 PM, Erick Erickson <er...@gmail.com>*
> *wrote:
>
>  See admin/analysis, it's invaluable. Probably
>>
>> The terms are being searched against your default text field which I'd
>> guess is not "title".
>>
>> Also, try adding &debug=all to your query and look in the debug info at
>> the
>> parsed form of the query to see what's actually being searched.
>>
>> Best
>> Erick
>>
>>
>> On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <ro...@gmail.com>
>> wrote:
>>
>> > hi all
>> >
>> > wanted to know I have this string in field title :
>> >
>> > Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>> >
>> > I have indexed it using text-en-splliting-tight
>> >
>> >
>> > and now I am searching for term like q=dual core
>> >
>> > but in the relevance part its this title is coming down the order as
>> > solr is not searching dual in this string its just searching core term
>> > from the query in this string thus multiplying the score for this field
>> by
>> > 1/2
>> > decreasing the score.
>> >
>> > how can I correct this can any one help
>> >
>> > thanks
>> > regards
>> > Rohan
>> >
>>
>>
>

Re: had query regarding the indexing and analysers

Posted by Jack Krupansky <ja...@basetechnology.com>.

Yeah, one ambiguity in typography is whether a hyphen is internal to a 
compound term (e.g., "CD-ROM") or a phrase separator as in your case. Some 
people are careful to put spaces around the hyphen for a phrase delimiter, 
but plenty of people still just drop it in directly adjacent to two words.

In your case, text_en_splitting_tight is SPECIFICALLY trying to keep 
"Laptop-DUAL" together as a single term, so that "wi fi" is kept distinct 
from "Wi-Fi".

Try text_en_splitting, which specifically is NOT trying to keep them 
together.

The key clue here is that the former does not have generateWordParts="1". 
That is the option that is needed so that "Laptop-DUAL" will be indexed as 
"laptop dual".

-- Jack Krupansky

-----Original Message----- 
From: Rohan Thakur
Sent: Tuesday, March 19, 2013 3:35 AM
To: solr-user@lucene.apache.org
Subject: Re: had query regarding the indexing and analysers

my default is title only I have used debug as well it shows that solr
divides the query into dual and core and then searches both separately now
while calculating the scores it puts the document in which both the terms
appear and in my case the document containing this title:

Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD

solr has found only core term not dual as I guess it is
attached to laptop term not as even searching for only dual
term this document doesnot show up which is why this document
sshows down in the search results thus I am not able to
search for partial terms for that I have to apply *dual
in the query then it is searching this document but then
other search scoring gets affected with this when I put * in
the query terms I think I have to remove the "-" terms from
the strings before indexing them point me if i am wrong any
where

thanks
regards
Rohan


On Sat, Mar 16, 2013 at 7:02 PM, Erick Erickson 
<er...@gmail.com>wrote:

> See admin/analysis, it's invaluable. Probably
>
> The terms are being searched against your default text field which I'd
> guess is not "title".
>
> Also, try adding &debug=all to your query and look in the debug info at 
> the
> parsed form of the query to see what's actually being searched.
>
> Best
> Erick
>
>
> On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <ro...@gmail.com>
> wrote:
>
> > hi all
> >
> > wanted to know I have this string in field title :
> >
> > Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
> >
> > I have indexed it using text-en-splliting-tight
> >
> >
> > and now I am searching for term like q=dual core
> >
> > but in the relevance part its this title is coming down the order as
> > solr is not searching dual in this string its just searching core term
> > from the query in this string thus multiplying the score for this field
> by
> > 1/2
> > decreasing the score.
> >
> > how can I correct this can any one help
> >
> > thanks
> > regards
> > Rohan
> >
>

Re: had query regarding the indexing and analysers

Posted by Erick Erickson <er...@gmail.com>.

Please review: http://wiki.apache.org/solr/UsingMailingLists. You're
forcing us to guess at what's going on. You haven't posted the
results of adding debug=query (or debug=all). You haven't shown
us the field definitions  for the fields in question. You haven't
given us much info to help us help you.

Best
Erick


On Tue, Mar 19, 2013 at 12:35 AM, Rohan Thakur <ro...@gmail.com> wrote:

> my default is title only I have used debug as well it shows that solr
> divides the query into dual and core and then searches both separately now
> while calculating the scores it puts the document in which both the terms
> appear and in my case the document containing this title:
>
> Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>
> solr has found only core term not dual as I guess it is
> attached to laptop term not as even searching for only dual
> term this document doesnot show up which is why this document
> sshows down in the search results thus I am not able to
> search for partial terms for that I have to apply *dual
> in the query then it is searching this document but then
> other search scoring gets affected with this when I put * in
> the query terms I think I have to remove the "-" terms from
> the strings before indexing them point me if i am wrong any
> where
>
> thanks
> regards
> Rohan
>
>
> On Sat, Mar 16, 2013 at 7:02 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > See admin/analysis, it's invaluable. Probably
> >
> > The terms are being searched against your default text field which I'd
> > guess is not "title".
> >
> > Also, try adding &debug=all to your query and look in the debug info at
> the
> > parsed form of the query to see what's actually being searched.
> >
> > Best
> > Erick
> >
> >
> > On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <ro...@gmail.com>
> > wrote:
> >
> > > hi all
> > >
> > > wanted to know I have this string in field title :
> > >
> > > Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
> > >
> > > I have indexed it using text-en-splliting-tight
> > >
> > >
> > > and now I am searching for term like q=dual core
> > >
> > > but in the relevance part its this title is coming down the order as
> > > solr is not searching dual in this string its just searching core term
> > > from the query in this string thus multiplying the score for this field
> > by
> > > 1/2
> > > decreasing the score.
> > >
> > > how can I correct this can any one help
> > >
> > > thanks
> > > regards
> > > Rohan
> > >
> >
>

Re: had query regarding the indexing and analysers

Posted by Rohan Thakur <ro...@gmail.com>.

my default is title only I have used debug as well it shows that solr
divides the query into dual and core and then searches both separately now
while calculating the scores it puts the document in which both the terms
appear and in my case the document containing this title:

Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD

solr has found only core term not dual as I guess it is
attached to laptop term not as even searching for only dual
term this document doesnot show up which is why this document
sshows down in the search results thus I am not able to
search for partial terms for that I have to apply *dual
in the query then it is searching this document but then
other search scoring gets affected with this when I put * in
the query terms I think I have to remove the "-" terms from
the strings before indexing them point me if i am wrong any
where

thanks
regards
Rohan


On Sat, Mar 16, 2013 at 7:02 PM, Erick Erickson <er...@gmail.com>wrote:

> See admin/analysis, it's invaluable. Probably
>
> The terms are being searched against your default text field which I'd
> guess is not "title".
>
> Also, try adding &debug=all to your query and look in the debug info at the
> parsed form of the query to see what's actually being searched.
>
> Best
> Erick
>
>
> On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <ro...@gmail.com>
> wrote:
>
> > hi all
> >
> > wanted to know I have this string in field title :
> >
> > Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
> >
> > I have indexed it using text-en-splliting-tight
> >
> >
> > and now I am searching for term like q=dual core
> >
> > but in the relevance part its this title is coming down the order as
> > solr is not searching dual in this string its just searching core term
> > from the query in this string thus multiplying the score for this field
> by
> > 1/2
> > decreasing the score.
> >
> > how can I correct this can any one help
> >
> > thanks
> > regards
> > Rohan
> >
>

Re: had query regarding the indexing and analysers

Posted by Erick Erickson <er...@gmail.com>.

See admin/analysis, it's invaluable. Probably

The terms are being searched against your default text field which I'd
guess is not "title".

Also, try adding &debug=all to your query and look in the debug info at the
parsed form of the query to see what's actually being searched.

Best
Erick

On Fri, Mar 15, 2013 at 2:52 AM, Rohan Thakur <ro...@gmail.com> wrote:

> hi all
>
> wanted to know I have this string in field title :
>
> Wipro  7710U Laptop-DUAL CORE 1.4 Ghz-120GB HDD
>
> I have indexed it using text-en-splliting-tight
>
>
> and now I am searching for term like q=dual core
>
> but in the relevance part its this title is coming down the order as
> solr is not searching dual in this string its just searching core term
> from the query in this string thus multiplying the score for this field by
> 1/2
> decreasing the score.
>
> how can I correct this can any one help
>
> thanks
> regards
> Rohan
>