You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Gastone Penzo <ga...@gmail.com> on 2013/12/20 14:42:37 UTC

Spellchecking problem

Hello,

i have problem with spellchecking.
i use solr to index an ecommerce products (dvd, cd, books ecc)
the collation is only one but in the index there'is the field: typology (of
product)
When i build spellchecking indexes, they are build together.
How can i have only suggestsions of one typology?

i read that if i user spellcheck.collate=true and i maxcollatetries > 0,
solr evaluates every suggestion with fq parameter of the query. In my query
i have for example fq=typology:book
but it doesn't works. why?

i also tried collationparameter.fq=typology:book
the same

i use solr 4.3
thank you


-- 
*Gastone Penzo*

RE: Spellchecking problem

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Gastone,

You may, at least while developing, specify "spellcheck.collateExtendedResults=true" so you can see for sure it has verified how many hits each collation would return.

But my guess is that your "mm" parameter makes pretty much anything return some hits.  You might want to specify "spellcheck.collateParam.mm=100%" or something like that to restrict collations to only those queries that return hits if all the terms were required.

See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX .

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Gastone Penzo [mailto:gastone.penzo@gmail.com] 
Sent: Friday, December 20, 2013 8:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecking problem

Thank you for your answer.

this is the querystring

http://seshat:9000/solr/browse/?q=otto+maialotto&fq=shelf:GIO&qf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0 category_label^0 &pf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0
category_label^0&spellcheck=true&spellcheck.collate=true&spellcheck.maxCollationTries=10&spellcheck.q=otto+il+maialotto&mm=2%3C-1+5%3C80%25&

shelf is the field that rappresent the typology of product and GIO is the
typology (games)

the problem is the collation
the result gives ( Otto il polpo ) is the name of another product typology
(Book)
why?

the result is this.

<lst name="spellcheck">
    <lst name="suggestions">
        <lst name="otto il maialotto">
            <int name="numFound">5</int>
            <int name="startOffset">0</int>
            <int name="endOffset">17</int>
            <int name="origFreq">0</int>
            <arr name="suggestion">
                <lst>
                    <str name="word">otto il polpo</str>
                    <int name="freq">2</int>
                </lst>
                <lst>
                    <str name="word">gigetto il maialetto  vol.0</str>
                    <int name="freq">2</int>
                </lst>
                <lst>
                    <str name="word">sotto il mare  vol.0</str>
                    <int name="freq">2</int>
                </lst>
                <lst>
                    <str name="word">sotto il mare</str>
                    <int name="freq">2</int>
                </lst>
                <lst>
                    <str name="word">otto il rinoceronte</str>
                    <int name="freq">2</int>
                </lst>
            </arr>
        </lst>
        <bool name="correctlySpelled">true</bool>
        <str name="collation">(otto il polpo)</str>
    </lst>
</lst>

this is the conf:


    <str name="queryAnalyzerFieldType">textSpell</str>

    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">spellcheckdef</str>
      <str name="spellcheckIndexDir">spellchecker</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.onlyMorePopular">false</str>
      <str name="spellcheck.extendedResults">true</str>
      <str name="spellcheck.count">6</str>
      <str name="spellcheck.collate">true</str>
      <float name="thresholdTokenFrequency">.0000001</float>
    </lst>

  </searchComponent>

Thanks






2013/12/20 Dyer, James <Ja...@ingramcontent.com>

> If you are using "spellcheck.maxCollateTries" with a value greater than 0
> the *collatation* section of your spellcheck response will give query
> corrections that are proven to produce hits.  Possibly you were looking at
> the first section where it gives individual word suggestions?  Or maybe one
> of your query parameters is misspelled (check case and that you have
> "spellcheck." in front of all of them)?  If you can't figure it out,
> provide us the entire query string you're using, the spellcheck response
> you get back and also the relevant portions of solrconfig.xml.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Gastone Penzo [mailto:gastone.penzo@gmail.com]
> Sent: Friday, December 20, 2013 7:43 AM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking problem
>
> Hello,
>
> i have problem with spellchecking.
> i use solr to index an ecommerce products (dvd, cd, books ecc)
> the collation is only one but in the index there'is the field: typology (of
> product)
> When i build spellchecking indexes, they are build together.
> How can i have only suggestsions of one typology?
>
> i read that if i user spellcheck.collate=true and i maxcollatetries > 0,
> solr evaluates every suggestion with fq parameter of the query. In my query
> i have for example fq=typology:book
> but it doesn't works. why?
>
> i also tried collationparameter.fq=typology:book
> the same
>
> i use solr 4.3
> thank you
>
>
> --
> *Gastone Penzo*
>
>


-- 
*Gastone Penzo*


Re: Spellchecking problem

Posted by Gastone Penzo <ga...@gmail.com>.
Thank you for your answer.

this is the querystring

http://seshat:9000/solr/browse/?q=otto+maialotto&fq=shelf:GIO&qf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0 category_label^0 &pf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0
category_label^0&spellcheck=true&spellcheck.collate=true&spellcheck.maxCollationTries=10&spellcheck.q=otto+il+maialotto&mm=2%3C-1+5%3C80%25&

shelf is the field that rappresent the typology of product and GIO is the
typology (games)

the problem is the collation
the result gives ( Otto il polpo ) is the name of another product typology
(Book)
why?

the result is this.

<lst name="spellcheck">
    <lst name="suggestions">
        <lst name="otto il maialotto">
            <int name="numFound">5</int>
            <int name="startOffset">0</int>
            <int name="endOffset">17</int>
            <int name="origFreq">0</int>
            <arr name="suggestion">
                <lst>
                    <str name="word">otto il polpo</str>
                    <int name="freq">2</int>
                </lst>
                <lst>
                    <str name="word">gigetto il maialetto  vol.0</str>
                    <int name="freq">2</int>
                </lst>
                <lst>
                    <str name="word">sotto il mare  vol.0</str>
                    <int name="freq">2</int>
                </lst>
                <lst>
                    <str name="word">sotto il mare</str>
                    <int name="freq">2</int>
                </lst>
                <lst>
                    <str name="word">otto il rinoceronte</str>
                    <int name="freq">2</int>
                </lst>
            </arr>
        </lst>
        <bool name="correctlySpelled">true</bool>
        <str name="collation">(otto il polpo)</str>
    </lst>
</lst>

this is the conf:


    <str name="queryAnalyzerFieldType">textSpell</str>

    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">spellcheckdef</str>
      <str name="spellcheckIndexDir">spellchecker</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.onlyMorePopular">false</str>
      <str name="spellcheck.extendedResults">true</str>
      <str name="spellcheck.count">6</str>
      <str name="spellcheck.collate">true</str>
      <float name="thresholdTokenFrequency">.0000001</float>
    </lst>

  </searchComponent>

Thanks






2013/12/20 Dyer, James <Ja...@ingramcontent.com>

> If you are using "spellcheck.maxCollateTries" with a value greater than 0
> the *collatation* section of your spellcheck response will give query
> corrections that are proven to produce hits.  Possibly you were looking at
> the first section where it gives individual word suggestions?  Or maybe one
> of your query parameters is misspelled (check case and that you have
> "spellcheck." in front of all of them)?  If you can't figure it out,
> provide us the entire query string you're using, the spellcheck response
> you get back and also the relevant portions of solrconfig.xml.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Gastone Penzo [mailto:gastone.penzo@gmail.com]
> Sent: Friday, December 20, 2013 7:43 AM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking problem
>
> Hello,
>
> i have problem with spellchecking.
> i use solr to index an ecommerce products (dvd, cd, books ecc)
> the collation is only one but in the index there'is the field: typology (of
> product)
> When i build spellchecking indexes, they are build together.
> How can i have only suggestsions of one typology?
>
> i read that if i user spellcheck.collate=true and i maxcollatetries > 0,
> solr evaluates every suggestion with fq parameter of the query. In my query
> i have for example fq=typology:book
> but it doesn't works. why?
>
> i also tried collationparameter.fq=typology:book
> the same
>
> i use solr 4.3
> thank you
>
>
> --
> *Gastone Penzo*
>
>


-- 
*Gastone Penzo*

RE: Spellchecking problem

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
If you are using "spellcheck.maxCollateTries" with a value greater than 0 the *collatation* section of your spellcheck response will give query corrections that are proven to produce hits.  Possibly you were looking at the first section where it gives individual word suggestions?  Or maybe one of your query parameters is misspelled (check case and that you have "spellcheck." in front of all of them)?  If you can't figure it out, provide us the entire query string you're using, the spellcheck response you get back and also the relevant portions of solrconfig.xml.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Gastone Penzo [mailto:gastone.penzo@gmail.com] 
Sent: Friday, December 20, 2013 7:43 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking problem

Hello,

i have problem with spellchecking.
i use solr to index an ecommerce products (dvd, cd, books ecc)
the collation is only one but in the index there'is the field: typology (of
product)
When i build spellchecking indexes, they are build together.
How can i have only suggestsions of one typology?

i read that if i user spellcheck.collate=true and i maxcollatetries > 0,
solr evaluates every suggestion with fq parameter of the query. In my query
i have for example fq=typology:book
but it doesn't works. why?

i also tried collationparameter.fq=typology:book
the same

i use solr 4.3
thank you


-- 
*Gastone Penzo*