You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Rafał Piekarski (RaVbaker)" <ra...@gmail.com> on 2011/08/20 17:04:25 UTC

Too many results in dismax queries with one word

Hi all,

I have a database of e-commerce products (5M) and trying to build a search
solution for it.

I have used steemer, edgengram and doublemetaphone phonetic fields for
omiting common typos in queries.  It works quite good with dismax QParser
for queries longer than one word: "tv lc20", "sny psp 3001", "cannon 5d"
etc. For not having too many results I manipulated with `mm` parameter. But
when user type a single word like "ipad", "cannon". I always having a lot of
results (~60000). This is unacceptable for my client. He would like to have
then only the `good` results. That particulary match specific query. It's
hard to acomplish for me cause of use doublemetaphone field which converts
words like "apt", "opt" and "ipad" and even "ipod" to the same phonetic word
- APT. And then all of these  words are matched fairly the same gives me
huge amount of results. Similar problems I have with other words like
"canon", "canine" and "cannon" which are KNN in phonetic way. But lexically
have different meanings: "canon" - camera, "canine" - cat food , "cannon" -
may be a misspell for canon or part of book title about cannon weapons.

My first idea was to make a second requestHandler without searching in
*_phonetic fields. And use it for queries with only one word. But it didn't
worked cause sometimes I want to correct user even if there is only one word
and suggest him something better. Query "cannon" is a good example. I'm
fairly sure that most of the time when someone type "cannon" it would be a
typo for "canon" and I want to show user also CANON cameras. That's why I
can't use second requestHandler for one word queries.

I'm looking for any ideas how could I change my requestHandler.

My regular queries are: http://localhost:8983/solr/select?q=cannon

Below I put my configuration for requestHandler and schema.xml.



solrconfig.xml:

<requestHandler name="search" class="solr.SearchHandler" default="true">
   <lst name="defaults">
<str name="q.alt">*:*</str>
     <str name="defType">dismax</str>
     <str name="qf">
         title^1.3 title_text^0.9 title_phonetic^0.74 title_ng^0.17
         title_ngram^0.54
         producer_name^0.9 producer_name_text^0.89
         category_path_text^0.8 category_path_phonetic^0.65
         description^0.60 description_text^0.56
     </str>
     <str name="pf">title_text^1.1 title^1.2 description^0.3</str>
     <int name="ps">3</int>
     <str name="tie">0.1</str>
     <str name="mm">2&lt;100% 3&lt;-1 5&lt;85%</str>

     <str name="fl">*,score</str>
</lst>
</requestHandler>


schema.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="XX" version="1.2">
    <types>
        <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
omitNorms="true" positionIncrementGap="0" />
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
omitNorms="true" positionIncrementGap="0"/>
        <fieldType name="string" class="solr.StrField"
sortMissingLast="true" omitNorms="true" />
        <fieldType name="boolean" class="solr.BoolField"
sortMissingLast="true" omitNorms="true" />
        <fieldType name="decimal" class="solr.TrieFloatField"
precisionStep="2" omitNorms="true" positionIncrementGap="0" />

        <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
            <analyzer>
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                                words="stopwords_pl.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.TrimFilterFactory" />
<filter class="solr.StempelPolishStemFilterFactory" />
            </analyzer>
        </fieldType>

    <fieldType name="text_gen" class="solr.TextField"
positionIncrementGap="100">
            <analyzer>
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_pl.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.TrimFilterFactory" />
            </analyzer>
        </fieldType>


    <fieldtype name="phonetic" stored="false" indexed="true"
class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_pl.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"
maxCodeLength="8"/>
      </analyzer>
    </fieldtype>

 <fieldtype name="ngram" class="solr.TextField">
   <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_pl.txt"
                enablePositionIncrements="true"
                />
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

                        <filter class="solr.NGramFilterFactory"
minGramSize="2" maxGramSize="3" />
                    </analyzer>
                    <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
                        <filter class="solr.NGramFilterFactory"
minGramSize="2" maxGramSize="3" />
                    </analyzer>
                 </fieldtype>

<fieldtype name="edgengram" class="solr.TextField">
   <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_pl.txt"
                enablePositionIncrements="true"
                />
         <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

     <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="15" side="front"/>

 </analyzer>
                 </fieldtype>


    </types>
    <fields>
        <field name="id" type="string" indexed="true" stored="true"
required="true" />
        <field name="title" type="text_gen" indexed="true" stored="true"
required="true" />
        <field name="category_path" type="string" indexed="true"
stored="true" />

        <field name="producer_name" type="string" indexed="true"
stored="false" />
        <field name="description" type="text_gen" indexed="false"
stored="true" />

 <dynamicField name="*_text" type="text" indexed="true" stored="false" />

 <dynamicField name="*_ascii" type="text_ascii" indexed="true"
stored="false" />
 <dynamicField name="*_phonetic" type="phonetic" indexed="true"
stored="false" />
 <dynamicField name="*_ng" type="edgengram" indexed="true" stored="false" />

 <dynamicField name="*_ngram" type="ngram" indexed="true" stored="false" />


    </fields>
    <uniqueKey>id</uniqueKey>
    <defaultSearchField>title</defaultSearchField>
    <solrQueryParser defaultOperator="AND" />

    <copyField source="title" dest="title_sort" />
 <copyField source="title" dest="title_text" />
<copyField source="title" dest="title_ascii" />
    <copyField source="title" dest="title_phonetic" />
    <copyField source="title" dest="title_ng" />
    <copyField source="title" dest="title_ngram"/>

 <copyField source="producer_name" dest="producer_name_text" />
 <copyField source="producer_name" dest="producer_name_phonetic" />

    <copyField source="category_path" dest="category_path_text" />
<copyField source="category_path" dest="category_path_phonetic" />
   <copyField source="description" dest="description_text" />

</schema>





-- 
Rafał "RaVbaker" Piekarski.

web: http://ja.ravbaker.net
mail: ravbaker@gmail.com
jid/xmpp/aim: ravbaker@gmail.com
mobile: +48-663-808-481

Re: Too many results in dismax queries with one word

Posted by "Rafał Piekarski (RaVbaker)" <ra...@gmail.com>.
Thanks very much for your advice. I think I now better understand how to
make better use of solr. I have tested spellchecker and it looks like it let
me to achieve better results and hopefully we will satisfy the client.

In my solution I will change user query to use or not to use phonetic fields
based on results from spellcheck.collation and frequency of words. If I
wouldn't be sure what is better then I'll ask user through "did you mean"
and log his reply to make better choices in future.

Once again thanks a lot guys.

This is my example of query to spellchecker:

http://localhost:8983/solr/select?spellcheck=true&q=cannon&rows=0&spellcheck.collate=true&spellcheck.count=10&spellcheck.onlyMorePopular=true&spellcheck.extendedResults=on

-- 
Rafał "RaVbaker" Piekarski.

web: http://ja.ravbaker.net
mail: ravbaker@gmail.com
jid/xmpp/aim: ravbaker@gmail.com
mobile: +48-663-808-481


On Sun, Aug 21, 2011 at 6:36 PM, Erick Erickson <er...@gmail.com>wrote:

> I think Sujit has hit the nail on the head. Any program you try to write
> that tries to guess what the user *really* meant will require endless
> tinkering and *still* won't be right. If you only knew how annoying I
> find Google's attempts to "help".....
>
> So perhaps concentrating on some interaction with the user, who is,
> after all, the only one who really knows what they want is the best
> approach.
>
> Best
> Erick
>
> On Sun, Aug 21, 2011 at 12:26 PM, Sujit Pal <su...@comcast.net> wrote:
> > Would it make sense to have a "Did you mean?" type of functionality for
> > which you use the EdgeNGram and Metaphone filters /if/ you don't get
> > appropriate results for the user query?
> >
> > So when user types "cannon" and the application notices that there are
> > no cannons for sale in the index (0 results with standard analysis), it
> > then makes another query with the EdgeNGram and/or Metaphone filters and
> > come back with:
> >
> > Did you mean "Canon", "Canine"?
> >
> > Clicking on "Canon" or "Canine" would fire off a query for these terms.
> >
> > That way your application doesn't guess what is right, it goes back and
> > asks the user what he wants.
> >
> > -sujit
> >
> > On Sun, 2011-08-21 at 17:19 +0200, Rafał Piekarski (RaVbaker) wrote:
> >> Thanks for reply. I know that sometimes meeting all clients needs would
> be
> >> impossible but then client recalls that competitive (commercial) product
> >> already do that (but has other problems, like performance). And then I'm
> >> obligated to try more tricks. :/
> >>
> >> I'm currently using Solr 3.1 but thinking about migrating to latest
> stable
> >> version - 3.3.
> >>
> >> You correct, to meet client needs I have also used some hacks with
> boosting
> >> queries (`bq` and `bf` parameters) but I omit that to make XMLs clearer.
> >>
> >> You mentioned faceting. This is also one of my(my client?) problems. In
> the
> >> user interface they want to have 5 categories for products. Those 5
> should
> >> be most relevance ones. When I get those with highest counts for one
> word
> >> queries they are most of the time "not that which should be there". For
> >> example with phrase "ipad" which actually has only 12 most relevant
> products
> >> in category "tablets" but phonetic APT matches also part of model name
> for
> >> hundreds of UPS power supplies and bath tubes . And these are on the
> list,
> >> not tablets. :/
> >>
> >> But you mentioned autocomplete which is something what I haven't watched
> >> yet. I'll try with that and show it to my client.
> >>
> >> --
> >> Rafał "RaVbaker" Piekarski.
> >>
> >> web: http://ja.ravbaker.net
> >> mail: ravbaker@gmail.com
> >> jid/xmpp/aim: ravbaker@gmail.com
> >> mobile: +48-663-808-481
> >>
> >>
> >> On Sun, Aug 21, 2011 at 4:20 PM, Erick Erickson <
> erickerickson@gmail.com>wrote:
> >>
> >> > The root problem here is "This is unacceptable for my client". The
> first
> >> > thing I'd suggest is that you work with your client and get them to
> define
> >> > what is acceptable. You'll be forever changing things (to no good
> purpose)
> >> > if all they can say is "that's not right".
> >> >
> >> > For instance, you apparently have two competing requirements:
> >> > 1> try to correct users input, which inevitably increases the results
> >> > returned
> >> > 2> narrow the search to the "right" results.
> >> >
> >> > You can't have both every time!
> >> >
> >> > So you could try something like going with a more-restrictive search
> >> > (no metaphone
> >> > comparison) first and, if the results returned weren't sufficient
> >> > firing the "broader" query
> >> > back, without showing the too-small results first.
> >> >
> >> > You could work with your client and see if what they really want is
> >> > just the most relevant
> >> > results at the top of the list, in which case you can play with the
> >> > dismax field boosts
> >> > (by the way, what version of Solr are you using?)
> >> >
> >> > You could work with the client to understand the user experience if
> >> > you use autocomplete
> >> > and/or faceting etc. to guide their explorations.
> >> >
> >> > You could...
> >> >
> >> > But none of that will help unless and until you and your client can
> >> > agree what is the
> >> > correct behavior ahead of time
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On Sat, Aug 20, 2011 at 11:04 AM, Rafał Piekarski (RaVbaker)
> >> > <ra...@gmail.com> wrote:
> >> > > Hi all,
> >> > >
> >> > > I have a database of e-commerce products (5M) and trying to build a
> >> > search
> >> > > solution for it.
> >> > >
> >> > > I have used steemer, edgengram and doublemetaphone phonetic fields
> for
> >> > > omiting common typos in queries.  It works quite good with dismax
> QParser
> >> > > for queries longer than one word: "tv lc20", "sny psp 3001", "cannon
> 5d"
> >> > > etc. For not having too many results I manipulated with `mm`
> parameter.
> >> > But
> >> > > when user type a single word like "ipad", "cannon". I always having
> a lot
> >> > of
> >> > > results (~60000). This is unacceptable for my client. He would like
> to
> >> > have
> >> > > then only the `good` results. That particulary match specific query.
> It's
> >> > > hard to acomplish for me cause of use doublemetaphone field which
> >> > converts
> >> > > words like "apt", "opt" and "ipad" and even "ipod" to the same
> phonetic
> >> > word
> >> > > - APT. And then all of these  words are matched fairly the same
> gives me
> >> > > huge amount of results. Similar problems I have with other words
> like
> >> > > "canon", "canine" and "cannon" which are KNN in phonetic way. But
> >> > lexically
> >> > > have different meanings: "canon" - camera, "canine" - cat food ,
> "cannon"
> >> > -
> >> > > may be a misspell for canon or part of book title about cannon
> weapons.
> >> > >
> >> > > My first idea was to make a second requestHandler without searching
> in
> >> > > *_phonetic fields. And use it for queries with only one word. But it
> >> > didn't
> >> > > worked cause sometimes I want to correct user even if there is only
> one
> >> > word
> >> > > and suggest him something better. Query "cannon" is a good example.
> I'm
> >> > > fairly sure that most of the time when someone type "cannon" it
> would be
> >> > a
> >> > > typo for "canon" and I want to show user also CANON cameras. That's
> why I
> >> > > can't use second requestHandler for one word queries.
> >> > >
> >> > > I'm looking for any ideas how could I change my requestHandler.
> >> > >
> >> > > My regular queries are: http://localhost:8983/solr/select?q=cannon
> >> > >
> >> > > Below I put my configuration for requestHandler and schema.xml.
> >> > >
> >> > >
> >> > >
> >> > > solrconfig.xml:
> >> > >
> >> > > <requestHandler name="search" class="solr.SearchHandler"
> default="true">
> >> > >   <lst name="defaults">
> >> > > <str name="q.alt">*:*</str>
> >> > >     <str name="defType">dismax</str>
> >> > >     <str name="qf">
> >> > >         title^1.3 title_text^0.9 title_phonetic^0.74 title_ng^0.17
> >> > >         title_ngram^0.54
> >> > >         producer_name^0.9 producer_name_text^0.89
> >> > >         category_path_text^0.8 category_path_phonetic^0.65
> >> > >         description^0.60 description_text^0.56
> >> > >     </str>
> >> > >     <str name="pf">title_text^1.1 title^1.2 description^0.3</str>
> >> > >     <int name="ps">3</int>
> >> > >     <str name="tie">0.1</str>
> >> > >     <str name="mm">2&lt;100% 3&lt;-1 5&lt;85%</str>
> >> > >
> >> > >     <str name="fl">*,score</str>
> >> > > </lst>
> >> > > </requestHandler>
> >> > >
> >> > >
> >> > > schema.xml:
> >> > >
> >> > > <?xml version="1.0" encoding="UTF-8" ?>
> >> > > <schema name="XX" version="1.2">
> >> > >    <types>
> >> > >        <fieldType name="int" class="solr.TrieIntField"
> precisionStep="0"
> >> > > omitNorms="true" positionIncrementGap="0" />
> >> > >    <fieldType name="long" class="solr.TrieLongField"
> precisionStep="0"
> >> > > omitNorms="true" positionIncrementGap="0"/>
> >> > >        <fieldType name="string" class="solr.StrField"
> >> > > sortMissingLast="true" omitNorms="true" />
> >> > >        <fieldType name="boolean" class="solr.BoolField"
> >> > > sortMissingLast="true" omitNorms="true" />
> >> > >        <fieldType name="decimal" class="solr.TrieFloatField"
> >> > > precisionStep="2" omitNorms="true" positionIncrementGap="0" />
> >> > >
> >> > >        <fieldType name="text" class="solr.TextField"
> >> > > positionIncrementGap="100">
> >> > >            <analyzer>
> >> > >                <charFilter class="solr.HTMLStripCharFilterFactory"/>
> >> > >                <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >> > >        <!-- Case insensitive stop word removal.
> >> > >          add enablePositionIncrements=true in both the index and
> query
> >> > >          analyzers to leave a 'gap' for more accurate phrase
> queries.
> >> > >        -->
> >> > >        <filter class="solr.StopFilterFactory"
> >> > >                ignoreCase="true"
> >> > >                                words="stopwords_pl.txt"
> >> > >                enablePositionIncrements="true"
> >> > >                />
> >> > >        <filter class="solr.WordDelimiterFilterFactory"
> >> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >> > >
> >> > >                <filter class="solr.LowerCaseFilterFactory" />
> >> > >                <filter class="solr.TrimFilterFactory" />
> >> > > <filter class="solr.StempelPolishStemFilterFactory" />
> >> > >            </analyzer>
> >> > >        </fieldType>
> >> > >
> >> > >    <fieldType name="text_gen" class="solr.TextField"
> >> > > positionIncrementGap="100">
> >> > >            <analyzer>
> >> > >                <charFilter class="solr.HTMLStripCharFilterFactory"/>
> >> > >                <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >> > >        <filter class="solr.StopFilterFactory"
> >> > >                ignoreCase="true"
> >> > >                words="stopwords_pl.txt"
> >> > >                enablePositionIncrements="true"
> >> > >                />
> >> > >        <filter class="solr.WordDelimiterFilterFactory"
> >> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >> > >
> >> > >                <filter class="solr.LowerCaseFilterFactory" />
> >> > >                <filter class="solr.TrimFilterFactory" />
> >> > >            </analyzer>
> >> > >        </fieldType>
> >> > >
> >> > >
> >> > >    <fieldtype name="phonetic" stored="false" indexed="true"
> >> > > class="solr.TextField" >
> >> > >      <analyzer>
> >> > >        <tokenizer class="solr.StandardTokenizerFactory"/>
> >> > >        <filter class="solr.StopFilterFactory"
> >> > >                ignoreCase="true"
> >> > >                words="stopwords_pl.txt"
> >> > >                enablePositionIncrements="true"
> >> > >                />
> >> > >        <filter class="solr.DoubleMetaphoneFilterFactory"
> inject="false"
> >> > > maxCodeLength="8"/>
> >> > >      </analyzer>
> >> > >    </fieldtype>
> >> > >
> >> > >  <fieldtype name="ngram" class="solr.TextField">
> >> > >   <analyzer type="index">
> >> > >                <tokenizer class="solr.StandardTokenizerFactory"/>
> >> > >      <filter class="solr.LowerCaseFilterFactory"/>
> >> > >        <filter class="solr.StopFilterFactory"
> >> > >                ignoreCase="true"
> >> > >                words="stopwords_pl.txt"
> >> > >                enablePositionIncrements="true"
> >> > >                />
> >> > >                <filter class="solr.WordDelimiterFilterFactory"
> >> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >> > >
> >> > >                        <filter class="solr.NGramFilterFactory"
> >> > > minGramSize="2" maxGramSize="3" />
> >> > >                    </analyzer>
> >> > >                    <analyzer type="query">
> >> > >                <tokenizer class="solr.StandardTokenizerFactory"/>
> >> > >      <filter class="solr.LowerCaseFilterFactory"/>
> >> > >                        <filter class="solr.NGramFilterFactory"
> >> > > minGramSize="2" maxGramSize="3" />
> >> > >                    </analyzer>
> >> > >                 </fieldtype>
> >> > >
> >> > > <fieldtype name="edgengram" class="solr.TextField">
> >> > >   <analyzer>
> >> > >                <tokenizer class="solr.StandardTokenizerFactory"/>
> >> > >      <filter class="solr.LowerCaseFilterFactory"/>
> >> > >        <filter class="solr.StopFilterFactory"
> >> > >                ignoreCase="true"
> >> > >                words="stopwords_pl.txt"
> >> > >                enablePositionIncrements="true"
> >> > >                />
> >> > >         <filter class="solr.WordDelimiterFilterFactory"
> >> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >> > >
> >> > >     <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> >> > > maxGramSize="15" side="front"/>
> >> > >
> >> > >  </analyzer>
> >> > >                 </fieldtype>
> >> > >
> >> > >
> >> > >    </types>
> >> > >    <fields>
> >> > >        <field name="id" type="string" indexed="true" stored="true"
> >> > > required="true" />
> >> > >        <field name="title" type="text_gen" indexed="true"
> stored="true"
> >> > > required="true" />
> >> > >        <field name="category_path" type="string" indexed="true"
> >> > > stored="true" />
> >> > >
> >> > >        <field name="producer_name" type="string" indexed="true"
> >> > > stored="false" />
> >> > >        <field name="description" type="text_gen" indexed="false"
> >> > > stored="true" />
> >> > >
> >> > >  <dynamicField name="*_text" type="text" indexed="true"
> stored="false" />
> >> > >
> >> > >  <dynamicField name="*_ascii" type="text_ascii" indexed="true"
> >> > > stored="false" />
> >> > >  <dynamicField name="*_phonetic" type="phonetic" indexed="true"
> >> > > stored="false" />
> >> > >  <dynamicField name="*_ng" type="edgengram" indexed="true"
> stored="false"
> >> > />
> >> > >
> >> > >  <dynamicField name="*_ngram" type="ngram" indexed="true"
> stored="false"
> >> > />
> >> > >
> >> > >
> >> > >    </fields>
> >> > >    <uniqueKey>id</uniqueKey>
> >> > >    <defaultSearchField>title</defaultSearchField>
> >> > >    <solrQueryParser defaultOperator="AND" />
> >> > >
> >> > >    <copyField source="title" dest="title_sort" />
> >> > >  <copyField source="title" dest="title_text" />
> >> > > <copyField source="title" dest="title_ascii" />
> >> > >    <copyField source="title" dest="title_phonetic" />
> >> > >    <copyField source="title" dest="title_ng" />
> >> > >    <copyField source="title" dest="title_ngram"/>
> >> > >
> >> > >  <copyField source="producer_name" dest="producer_name_text" />
> >> > >  <copyField source="producer_name" dest="producer_name_phonetic" />
> >> > >
> >> > >    <copyField source="category_path" dest="category_path_text" />
> >> > > <copyField source="category_path" dest="category_path_phonetic" />
> >> > >   <copyField source="description" dest="description_text" />
> >> > >
> >> > > </schema>
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Rafał "RaVbaker" Piekarski.
> >> > >
> >> > > web: http://ja.ravbaker.net
> >> > > mail: ravbaker@gmail.com
> >> > > jid/xmpp/aim: ravbaker@gmail.com
> >> > > mobile: +48-663-808-481
> >> > >
> >> >
> >
> >
>

Re: Too many results in dismax queries with one word

Posted by Erick Erickson <er...@gmail.com>.
I think Sujit has hit the nail on the head. Any program you try to write
that tries to guess what the user *really* meant will require endless
tinkering and *still* won't be right. If you only knew how annoying I
find Google's attempts to "help".....

So perhaps concentrating on some interaction with the user, who is,
after all, the only one who really knows what they want is the best approach.

Best
Erick

On Sun, Aug 21, 2011 at 12:26 PM, Sujit Pal <su...@comcast.net> wrote:
> Would it make sense to have a "Did you mean?" type of functionality for
> which you use the EdgeNGram and Metaphone filters /if/ you don't get
> appropriate results for the user query?
>
> So when user types "cannon" and the application notices that there are
> no cannons for sale in the index (0 results with standard analysis), it
> then makes another query with the EdgeNGram and/or Metaphone filters and
> come back with:
>
> Did you mean "Canon", "Canine"?
>
> Clicking on "Canon" or "Canine" would fire off a query for these terms.
>
> That way your application doesn't guess what is right, it goes back and
> asks the user what he wants.
>
> -sujit
>
> On Sun, 2011-08-21 at 17:19 +0200, Rafał Piekarski (RaVbaker) wrote:
>> Thanks for reply. I know that sometimes meeting all clients needs would be
>> impossible but then client recalls that competitive (commercial) product
>> already do that (but has other problems, like performance). And then I'm
>> obligated to try more tricks. :/
>>
>> I'm currently using Solr 3.1 but thinking about migrating to latest stable
>> version - 3.3.
>>
>> You correct, to meet client needs I have also used some hacks with boosting
>> queries (`bq` and `bf` parameters) but I omit that to make XMLs clearer.
>>
>> You mentioned faceting. This is also one of my(my client?) problems. In the
>> user interface they want to have 5 categories for products. Those 5 should
>> be most relevance ones. When I get those with highest counts for one word
>> queries they are most of the time "not that which should be there". For
>> example with phrase "ipad" which actually has only 12 most relevant products
>> in category "tablets" but phonetic APT matches also part of model name for
>> hundreds of UPS power supplies and bath tubes . And these are on the list,
>> not tablets. :/
>>
>> But you mentioned autocomplete which is something what I haven't watched
>> yet. I'll try with that and show it to my client.
>>
>> --
>> Rafał "RaVbaker" Piekarski.
>>
>> web: http://ja.ravbaker.net
>> mail: ravbaker@gmail.com
>> jid/xmpp/aim: ravbaker@gmail.com
>> mobile: +48-663-808-481
>>
>>
>> On Sun, Aug 21, 2011 at 4:20 PM, Erick Erickson <er...@gmail.com>wrote:
>>
>> > The root problem here is "This is unacceptable for my client". The first
>> > thing I'd suggest is that you work with your client and get them to define
>> > what is acceptable. You'll be forever changing things (to no good purpose)
>> > if all they can say is "that's not right".
>> >
>> > For instance, you apparently have two competing requirements:
>> > 1> try to correct users input, which inevitably increases the results
>> > returned
>> > 2> narrow the search to the "right" results.
>> >
>> > You can't have both every time!
>> >
>> > So you could try something like going with a more-restrictive search
>> > (no metaphone
>> > comparison) first and, if the results returned weren't sufficient
>> > firing the "broader" query
>> > back, without showing the too-small results first.
>> >
>> > You could work with your client and see if what they really want is
>> > just the most relevant
>> > results at the top of the list, in which case you can play with the
>> > dismax field boosts
>> > (by the way, what version of Solr are you using?)
>> >
>> > You could work with the client to understand the user experience if
>> > you use autocomplete
>> > and/or faceting etc. to guide their explorations.
>> >
>> > You could...
>> >
>> > But none of that will help unless and until you and your client can
>> > agree what is the
>> > correct behavior ahead of time
>> >
>> > Best
>> > Erick
>> >
>> > On Sat, Aug 20, 2011 at 11:04 AM, Rafał Piekarski (RaVbaker)
>> > <ra...@gmail.com> wrote:
>> > > Hi all,
>> > >
>> > > I have a database of e-commerce products (5M) and trying to build a
>> > search
>> > > solution for it.
>> > >
>> > > I have used steemer, edgengram and doublemetaphone phonetic fields for
>> > > omiting common typos in queries.  It works quite good with dismax QParser
>> > > for queries longer than one word: "tv lc20", "sny psp 3001", "cannon 5d"
>> > > etc. For not having too many results I manipulated with `mm` parameter.
>> > But
>> > > when user type a single word like "ipad", "cannon". I always having a lot
>> > of
>> > > results (~60000). This is unacceptable for my client. He would like to
>> > have
>> > > then only the `good` results. That particulary match specific query. It's
>> > > hard to acomplish for me cause of use doublemetaphone field which
>> > converts
>> > > words like "apt", "opt" and "ipad" and even "ipod" to the same phonetic
>> > word
>> > > - APT. And then all of these  words are matched fairly the same gives me
>> > > huge amount of results. Similar problems I have with other words like
>> > > "canon", "canine" and "cannon" which are KNN in phonetic way. But
>> > lexically
>> > > have different meanings: "canon" - camera, "canine" - cat food , "cannon"
>> > -
>> > > may be a misspell for canon or part of book title about cannon weapons.
>> > >
>> > > My first idea was to make a second requestHandler without searching in
>> > > *_phonetic fields. And use it for queries with only one word. But it
>> > didn't
>> > > worked cause sometimes I want to correct user even if there is only one
>> > word
>> > > and suggest him something better. Query "cannon" is a good example. I'm
>> > > fairly sure that most of the time when someone type "cannon" it would be
>> > a
>> > > typo for "canon" and I want to show user also CANON cameras. That's why I
>> > > can't use second requestHandler for one word queries.
>> > >
>> > > I'm looking for any ideas how could I change my requestHandler.
>> > >
>> > > My regular queries are: http://localhost:8983/solr/select?q=cannon
>> > >
>> > > Below I put my configuration for requestHandler and schema.xml.
>> > >
>> > >
>> > >
>> > > solrconfig.xml:
>> > >
>> > > <requestHandler name="search" class="solr.SearchHandler" default="true">
>> > >   <lst name="defaults">
>> > > <str name="q.alt">*:*</str>
>> > >     <str name="defType">dismax</str>
>> > >     <str name="qf">
>> > >         title^1.3 title_text^0.9 title_phonetic^0.74 title_ng^0.17
>> > >         title_ngram^0.54
>> > >         producer_name^0.9 producer_name_text^0.89
>> > >         category_path_text^0.8 category_path_phonetic^0.65
>> > >         description^0.60 description_text^0.56
>> > >     </str>
>> > >     <str name="pf">title_text^1.1 title^1.2 description^0.3</str>
>> > >     <int name="ps">3</int>
>> > >     <str name="tie">0.1</str>
>> > >     <str name="mm">2&lt;100% 3&lt;-1 5&lt;85%</str>
>> > >
>> > >     <str name="fl">*,score</str>
>> > > </lst>
>> > > </requestHandler>
>> > >
>> > >
>> > > schema.xml:
>> > >
>> > > <?xml version="1.0" encoding="UTF-8" ?>
>> > > <schema name="XX" version="1.2">
>> > >    <types>
>> > >        <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
>> > > omitNorms="true" positionIncrementGap="0" />
>> > >    <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
>> > > omitNorms="true" positionIncrementGap="0"/>
>> > >        <fieldType name="string" class="solr.StrField"
>> > > sortMissingLast="true" omitNorms="true" />
>> > >        <fieldType name="boolean" class="solr.BoolField"
>> > > sortMissingLast="true" omitNorms="true" />
>> > >        <fieldType name="decimal" class="solr.TrieFloatField"
>> > > precisionStep="2" omitNorms="true" positionIncrementGap="0" />
>> > >
>> > >        <fieldType name="text" class="solr.TextField"
>> > > positionIncrementGap="100">
>> > >            <analyzer>
>> > >                <charFilter class="solr.HTMLStripCharFilterFactory"/>
>> > >                <tokenizer class="solr.WhitespaceTokenizerFactory" />
>> > >        <!-- Case insensitive stop word removal.
>> > >          add enablePositionIncrements=true in both the index and query
>> > >          analyzers to leave a 'gap' for more accurate phrase queries.
>> > >        -->
>> > >        <filter class="solr.StopFilterFactory"
>> > >                ignoreCase="true"
>> > >                                words="stopwords_pl.txt"
>> > >                enablePositionIncrements="true"
>> > >                />
>> > >        <filter class="solr.WordDelimiterFilterFactory"
>> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> > >
>> > >                <filter class="solr.LowerCaseFilterFactory" />
>> > >                <filter class="solr.TrimFilterFactory" />
>> > > <filter class="solr.StempelPolishStemFilterFactory" />
>> > >            </analyzer>
>> > >        </fieldType>
>> > >
>> > >    <fieldType name="text_gen" class="solr.TextField"
>> > > positionIncrementGap="100">
>> > >            <analyzer>
>> > >                <charFilter class="solr.HTMLStripCharFilterFactory"/>
>> > >                <tokenizer class="solr.WhitespaceTokenizerFactory" />
>> > >        <filter class="solr.StopFilterFactory"
>> > >                ignoreCase="true"
>> > >                words="stopwords_pl.txt"
>> > >                enablePositionIncrements="true"
>> > >                />
>> > >        <filter class="solr.WordDelimiterFilterFactory"
>> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> > >
>> > >                <filter class="solr.LowerCaseFilterFactory" />
>> > >                <filter class="solr.TrimFilterFactory" />
>> > >            </analyzer>
>> > >        </fieldType>
>> > >
>> > >
>> > >    <fieldtype name="phonetic" stored="false" indexed="true"
>> > > class="solr.TextField" >
>> > >      <analyzer>
>> > >        <tokenizer class="solr.StandardTokenizerFactory"/>
>> > >        <filter class="solr.StopFilterFactory"
>> > >                ignoreCase="true"
>> > >                words="stopwords_pl.txt"
>> > >                enablePositionIncrements="true"
>> > >                />
>> > >        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"
>> > > maxCodeLength="8"/>
>> > >      </analyzer>
>> > >    </fieldtype>
>> > >
>> > >  <fieldtype name="ngram" class="solr.TextField">
>> > >   <analyzer type="index">
>> > >                <tokenizer class="solr.StandardTokenizerFactory"/>
>> > >      <filter class="solr.LowerCaseFilterFactory"/>
>> > >        <filter class="solr.StopFilterFactory"
>> > >                ignoreCase="true"
>> > >                words="stopwords_pl.txt"
>> > >                enablePositionIncrements="true"
>> > >                />
>> > >                <filter class="solr.WordDelimiterFilterFactory"
>> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> > >
>> > >                        <filter class="solr.NGramFilterFactory"
>> > > minGramSize="2" maxGramSize="3" />
>> > >                    </analyzer>
>> > >                    <analyzer type="query">
>> > >                <tokenizer class="solr.StandardTokenizerFactory"/>
>> > >      <filter class="solr.LowerCaseFilterFactory"/>
>> > >                        <filter class="solr.NGramFilterFactory"
>> > > minGramSize="2" maxGramSize="3" />
>> > >                    </analyzer>
>> > >                 </fieldtype>
>> > >
>> > > <fieldtype name="edgengram" class="solr.TextField">
>> > >   <analyzer>
>> > >                <tokenizer class="solr.StandardTokenizerFactory"/>
>> > >      <filter class="solr.LowerCaseFilterFactory"/>
>> > >        <filter class="solr.StopFilterFactory"
>> > >                ignoreCase="true"
>> > >                words="stopwords_pl.txt"
>> > >                enablePositionIncrements="true"
>> > >                />
>> > >         <filter class="solr.WordDelimiterFilterFactory"
>> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> > >
>> > >     <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
>> > > maxGramSize="15" side="front"/>
>> > >
>> > >  </analyzer>
>> > >                 </fieldtype>
>> > >
>> > >
>> > >    </types>
>> > >    <fields>
>> > >        <field name="id" type="string" indexed="true" stored="true"
>> > > required="true" />
>> > >        <field name="title" type="text_gen" indexed="true" stored="true"
>> > > required="true" />
>> > >        <field name="category_path" type="string" indexed="true"
>> > > stored="true" />
>> > >
>> > >        <field name="producer_name" type="string" indexed="true"
>> > > stored="false" />
>> > >        <field name="description" type="text_gen" indexed="false"
>> > > stored="true" />
>> > >
>> > >  <dynamicField name="*_text" type="text" indexed="true" stored="false" />
>> > >
>> > >  <dynamicField name="*_ascii" type="text_ascii" indexed="true"
>> > > stored="false" />
>> > >  <dynamicField name="*_phonetic" type="phonetic" indexed="true"
>> > > stored="false" />
>> > >  <dynamicField name="*_ng" type="edgengram" indexed="true" stored="false"
>> > />
>> > >
>> > >  <dynamicField name="*_ngram" type="ngram" indexed="true" stored="false"
>> > />
>> > >
>> > >
>> > >    </fields>
>> > >    <uniqueKey>id</uniqueKey>
>> > >    <defaultSearchField>title</defaultSearchField>
>> > >    <solrQueryParser defaultOperator="AND" />
>> > >
>> > >    <copyField source="title" dest="title_sort" />
>> > >  <copyField source="title" dest="title_text" />
>> > > <copyField source="title" dest="title_ascii" />
>> > >    <copyField source="title" dest="title_phonetic" />
>> > >    <copyField source="title" dest="title_ng" />
>> > >    <copyField source="title" dest="title_ngram"/>
>> > >
>> > >  <copyField source="producer_name" dest="producer_name_text" />
>> > >  <copyField source="producer_name" dest="producer_name_phonetic" />
>> > >
>> > >    <copyField source="category_path" dest="category_path_text" />
>> > > <copyField source="category_path" dest="category_path_phonetic" />
>> > >   <copyField source="description" dest="description_text" />
>> > >
>> > > </schema>
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > Rafał "RaVbaker" Piekarski.
>> > >
>> > > web: http://ja.ravbaker.net
>> > > mail: ravbaker@gmail.com
>> > > jid/xmpp/aim: ravbaker@gmail.com
>> > > mobile: +48-663-808-481
>> > >
>> >
>
>

Re: Too many results in dismax queries with one word

Posted by Sujit Pal <su...@comcast.net>.
Would it make sense to have a "Did you mean?" type of functionality for
which you use the EdgeNGram and Metaphone filters /if/ you don't get
appropriate results for the user query?

So when user types "cannon" and the application notices that there are
no cannons for sale in the index (0 results with standard analysis), it
then makes another query with the EdgeNGram and/or Metaphone filters and
come back with:

Did you mean "Canon", "Canine"?

Clicking on "Canon" or "Canine" would fire off a query for these terms.

That way your application doesn't guess what is right, it goes back and
asks the user what he wants.

-sujit

On Sun, 2011-08-21 at 17:19 +0200, Rafał Piekarski (RaVbaker) wrote:
> Thanks for reply. I know that sometimes meeting all clients needs would be
> impossible but then client recalls that competitive (commercial) product
> already do that (but has other problems, like performance). And then I'm
> obligated to try more tricks. :/
> 
> I'm currently using Solr 3.1 but thinking about migrating to latest stable
> version - 3.3.
> 
> You correct, to meet client needs I have also used some hacks with boosting
> queries (`bq` and `bf` parameters) but I omit that to make XMLs clearer.
> 
> You mentioned faceting. This is also one of my(my client?) problems. In the
> user interface they want to have 5 categories for products. Those 5 should
> be most relevance ones. When I get those with highest counts for one word
> queries they are most of the time "not that which should be there". For
> example with phrase "ipad" which actually has only 12 most relevant products
> in category "tablets" but phonetic APT matches also part of model name for
> hundreds of UPS power supplies and bath tubes . And these are on the list,
> not tablets. :/
> 
> But you mentioned autocomplete which is something what I haven't watched
> yet. I'll try with that and show it to my client.
> 
> -- 
> Rafał "RaVbaker" Piekarski.
> 
> web: http://ja.ravbaker.net
> mail: ravbaker@gmail.com
> jid/xmpp/aim: ravbaker@gmail.com
> mobile: +48-663-808-481
> 
> 
> On Sun, Aug 21, 2011 at 4:20 PM, Erick Erickson <er...@gmail.com>wrote:
> 
> > The root problem here is "This is unacceptable for my client". The first
> > thing I'd suggest is that you work with your client and get them to define
> > what is acceptable. You'll be forever changing things (to no good purpose)
> > if all they can say is "that's not right".
> >
> > For instance, you apparently have two competing requirements:
> > 1> try to correct users input, which inevitably increases the results
> > returned
> > 2> narrow the search to the "right" results.
> >
> > You can't have both every time!
> >
> > So you could try something like going with a more-restrictive search
> > (no metaphone
> > comparison) first and, if the results returned weren't sufficient
> > firing the "broader" query
> > back, without showing the too-small results first.
> >
> > You could work with your client and see if what they really want is
> > just the most relevant
> > results at the top of the list, in which case you can play with the
> > dismax field boosts
> > (by the way, what version of Solr are you using?)
> >
> > You could work with the client to understand the user experience if
> > you use autocomplete
> > and/or faceting etc. to guide their explorations.
> >
> > You could...
> >
> > But none of that will help unless and until you and your client can
> > agree what is the
> > correct behavior ahead of time
> >
> > Best
> > Erick
> >
> > On Sat, Aug 20, 2011 at 11:04 AM, Rafał Piekarski (RaVbaker)
> > <ra...@gmail.com> wrote:
> > > Hi all,
> > >
> > > I have a database of e-commerce products (5M) and trying to build a
> > search
> > > solution for it.
> > >
> > > I have used steemer, edgengram and doublemetaphone phonetic fields for
> > > omiting common typos in queries.  It works quite good with dismax QParser
> > > for queries longer than one word: "tv lc20", "sny psp 3001", "cannon 5d"
> > > etc. For not having too many results I manipulated with `mm` parameter.
> > But
> > > when user type a single word like "ipad", "cannon". I always having a lot
> > of
> > > results (~60000). This is unacceptable for my client. He would like to
> > have
> > > then only the `good` results. That particulary match specific query. It's
> > > hard to acomplish for me cause of use doublemetaphone field which
> > converts
> > > words like "apt", "opt" and "ipad" and even "ipod" to the same phonetic
> > word
> > > - APT. And then all of these  words are matched fairly the same gives me
> > > huge amount of results. Similar problems I have with other words like
> > > "canon", "canine" and "cannon" which are KNN in phonetic way. But
> > lexically
> > > have different meanings: "canon" - camera, "canine" - cat food , "cannon"
> > -
> > > may be a misspell for canon or part of book title about cannon weapons.
> > >
> > > My first idea was to make a second requestHandler without searching in
> > > *_phonetic fields. And use it for queries with only one word. But it
> > didn't
> > > worked cause sometimes I want to correct user even if there is only one
> > word
> > > and suggest him something better. Query "cannon" is a good example. I'm
> > > fairly sure that most of the time when someone type "cannon" it would be
> > a
> > > typo for "canon" and I want to show user also CANON cameras. That's why I
> > > can't use second requestHandler for one word queries.
> > >
> > > I'm looking for any ideas how could I change my requestHandler.
> > >
> > > My regular queries are: http://localhost:8983/solr/select?q=cannon
> > >
> > > Below I put my configuration for requestHandler and schema.xml.
> > >
> > >
> > >
> > > solrconfig.xml:
> > >
> > > <requestHandler name="search" class="solr.SearchHandler" default="true">
> > >   <lst name="defaults">
> > > <str name="q.alt">*:*</str>
> > >     <str name="defType">dismax</str>
> > >     <str name="qf">
> > >         title^1.3 title_text^0.9 title_phonetic^0.74 title_ng^0.17
> > >         title_ngram^0.54
> > >         producer_name^0.9 producer_name_text^0.89
> > >         category_path_text^0.8 category_path_phonetic^0.65
> > >         description^0.60 description_text^0.56
> > >     </str>
> > >     <str name="pf">title_text^1.1 title^1.2 description^0.3</str>
> > >     <int name="ps">3</int>
> > >     <str name="tie">0.1</str>
> > >     <str name="mm">2&lt;100% 3&lt;-1 5&lt;85%</str>
> > >
> > >     <str name="fl">*,score</str>
> > > </lst>
> > > </requestHandler>
> > >
> > >
> > > schema.xml:
> > >
> > > <?xml version="1.0" encoding="UTF-8" ?>
> > > <schema name="XX" version="1.2">
> > >    <types>
> > >        <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
> > > omitNorms="true" positionIncrementGap="0" />
> > >    <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> > > omitNorms="true" positionIncrementGap="0"/>
> > >        <fieldType name="string" class="solr.StrField"
> > > sortMissingLast="true" omitNorms="true" />
> > >        <fieldType name="boolean" class="solr.BoolField"
> > > sortMissingLast="true" omitNorms="true" />
> > >        <fieldType name="decimal" class="solr.TrieFloatField"
> > > precisionStep="2" omitNorms="true" positionIncrementGap="0" />
> > >
> > >        <fieldType name="text" class="solr.TextField"
> > > positionIncrementGap="100">
> > >            <analyzer>
> > >                <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > >                <tokenizer class="solr.WhitespaceTokenizerFactory" />
> > >        <!-- Case insensitive stop word removal.
> > >          add enablePositionIncrements=true in both the index and query
> > >          analyzers to leave a 'gap' for more accurate phrase queries.
> > >        -->
> > >        <filter class="solr.StopFilterFactory"
> > >                ignoreCase="true"
> > >                                words="stopwords_pl.txt"
> > >                enablePositionIncrements="true"
> > >                />
> > >        <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > >
> > >                <filter class="solr.LowerCaseFilterFactory" />
> > >                <filter class="solr.TrimFilterFactory" />
> > > <filter class="solr.StempelPolishStemFilterFactory" />
> > >            </analyzer>
> > >        </fieldType>
> > >
> > >    <fieldType name="text_gen" class="solr.TextField"
> > > positionIncrementGap="100">
> > >            <analyzer>
> > >                <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > >                <tokenizer class="solr.WhitespaceTokenizerFactory" />
> > >        <filter class="solr.StopFilterFactory"
> > >                ignoreCase="true"
> > >                words="stopwords_pl.txt"
> > >                enablePositionIncrements="true"
> > >                />
> > >        <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > >
> > >                <filter class="solr.LowerCaseFilterFactory" />
> > >                <filter class="solr.TrimFilterFactory" />
> > >            </analyzer>
> > >        </fieldType>
> > >
> > >
> > >    <fieldtype name="phonetic" stored="false" indexed="true"
> > > class="solr.TextField" >
> > >      <analyzer>
> > >        <tokenizer class="solr.StandardTokenizerFactory"/>
> > >        <filter class="solr.StopFilterFactory"
> > >                ignoreCase="true"
> > >                words="stopwords_pl.txt"
> > >                enablePositionIncrements="true"
> > >                />
> > >        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"
> > > maxCodeLength="8"/>
> > >      </analyzer>
> > >    </fieldtype>
> > >
> > >  <fieldtype name="ngram" class="solr.TextField">
> > >   <analyzer type="index">
> > >                <tokenizer class="solr.StandardTokenizerFactory"/>
> > >      <filter class="solr.LowerCaseFilterFactory"/>
> > >        <filter class="solr.StopFilterFactory"
> > >                ignoreCase="true"
> > >                words="stopwords_pl.txt"
> > >                enablePositionIncrements="true"
> > >                />
> > >                <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > >
> > >                        <filter class="solr.NGramFilterFactory"
> > > minGramSize="2" maxGramSize="3" />
> > >                    </analyzer>
> > >                    <analyzer type="query">
> > >                <tokenizer class="solr.StandardTokenizerFactory"/>
> > >      <filter class="solr.LowerCaseFilterFactory"/>
> > >                        <filter class="solr.NGramFilterFactory"
> > > minGramSize="2" maxGramSize="3" />
> > >                    </analyzer>
> > >                 </fieldtype>
> > >
> > > <fieldtype name="edgengram" class="solr.TextField">
> > >   <analyzer>
> > >                <tokenizer class="solr.StandardTokenizerFactory"/>
> > >      <filter class="solr.LowerCaseFilterFactory"/>
> > >        <filter class="solr.StopFilterFactory"
> > >                ignoreCase="true"
> > >                words="stopwords_pl.txt"
> > >                enablePositionIncrements="true"
> > >                />
> > >         <filter class="solr.WordDelimiterFilterFactory"
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > >
> > >     <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> > > maxGramSize="15" side="front"/>
> > >
> > >  </analyzer>
> > >                 </fieldtype>
> > >
> > >
> > >    </types>
> > >    <fields>
> > >        <field name="id" type="string" indexed="true" stored="true"
> > > required="true" />
> > >        <field name="title" type="text_gen" indexed="true" stored="true"
> > > required="true" />
> > >        <field name="category_path" type="string" indexed="true"
> > > stored="true" />
> > >
> > >        <field name="producer_name" type="string" indexed="true"
> > > stored="false" />
> > >        <field name="description" type="text_gen" indexed="false"
> > > stored="true" />
> > >
> > >  <dynamicField name="*_text" type="text" indexed="true" stored="false" />
> > >
> > >  <dynamicField name="*_ascii" type="text_ascii" indexed="true"
> > > stored="false" />
> > >  <dynamicField name="*_phonetic" type="phonetic" indexed="true"
> > > stored="false" />
> > >  <dynamicField name="*_ng" type="edgengram" indexed="true" stored="false"
> > />
> > >
> > >  <dynamicField name="*_ngram" type="ngram" indexed="true" stored="false"
> > />
> > >
> > >
> > >    </fields>
> > >    <uniqueKey>id</uniqueKey>
> > >    <defaultSearchField>title</defaultSearchField>
> > >    <solrQueryParser defaultOperator="AND" />
> > >
> > >    <copyField source="title" dest="title_sort" />
> > >  <copyField source="title" dest="title_text" />
> > > <copyField source="title" dest="title_ascii" />
> > >    <copyField source="title" dest="title_phonetic" />
> > >    <copyField source="title" dest="title_ng" />
> > >    <copyField source="title" dest="title_ngram"/>
> > >
> > >  <copyField source="producer_name" dest="producer_name_text" />
> > >  <copyField source="producer_name" dest="producer_name_phonetic" />
> > >
> > >    <copyField source="category_path" dest="category_path_text" />
> > > <copyField source="category_path" dest="category_path_phonetic" />
> > >   <copyField source="description" dest="description_text" />
> > >
> > > </schema>
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Rafał "RaVbaker" Piekarski.
> > >
> > > web: http://ja.ravbaker.net
> > > mail: ravbaker@gmail.com
> > > jid/xmpp/aim: ravbaker@gmail.com
> > > mobile: +48-663-808-481
> > >
> >


Re: Too many results in dismax queries with one word

Posted by "Rafał Piekarski (RaVbaker)" <ra...@gmail.com>.
Thanks for reply. I know that sometimes meeting all clients needs would be
impossible but then client recalls that competitive (commercial) product
already do that (but has other problems, like performance). And then I'm
obligated to try more tricks. :/

I'm currently using Solr 3.1 but thinking about migrating to latest stable
version - 3.3.

You correct, to meet client needs I have also used some hacks with boosting
queries (`bq` and `bf` parameters) but I omit that to make XMLs clearer.

You mentioned faceting. This is also one of my(my client?) problems. In the
user interface they want to have 5 categories for products. Those 5 should
be most relevance ones. When I get those with highest counts for one word
queries they are most of the time "not that which should be there". For
example with phrase "ipad" which actually has only 12 most relevant products
in category "tablets" but phonetic APT matches also part of model name for
hundreds of UPS power supplies and bath tubes . And these are on the list,
not tablets. :/

But you mentioned autocomplete which is something what I haven't watched
yet. I'll try with that and show it to my client.

-- 
Rafał "RaVbaker" Piekarski.

web: http://ja.ravbaker.net
mail: ravbaker@gmail.com
jid/xmpp/aim: ravbaker@gmail.com
mobile: +48-663-808-481


On Sun, Aug 21, 2011 at 4:20 PM, Erick Erickson <er...@gmail.com>wrote:

> The root problem here is "This is unacceptable for my client". The first
> thing I'd suggest is that you work with your client and get them to define
> what is acceptable. You'll be forever changing things (to no good purpose)
> if all they can say is "that's not right".
>
> For instance, you apparently have two competing requirements:
> 1> try to correct users input, which inevitably increases the results
> returned
> 2> narrow the search to the "right" results.
>
> You can't have both every time!
>
> So you could try something like going with a more-restrictive search
> (no metaphone
> comparison) first and, if the results returned weren't sufficient
> firing the "broader" query
> back, without showing the too-small results first.
>
> You could work with your client and see if what they really want is
> just the most relevant
> results at the top of the list, in which case you can play with the
> dismax field boosts
> (by the way, what version of Solr are you using?)
>
> You could work with the client to understand the user experience if
> you use autocomplete
> and/or faceting etc. to guide their explorations.
>
> You could...
>
> But none of that will help unless and until you and your client can
> agree what is the
> correct behavior ahead of time
>
> Best
> Erick
>
> On Sat, Aug 20, 2011 at 11:04 AM, Rafał Piekarski (RaVbaker)
> <ra...@gmail.com> wrote:
> > Hi all,
> >
> > I have a database of e-commerce products (5M) and trying to build a
> search
> > solution for it.
> >
> > I have used steemer, edgengram and doublemetaphone phonetic fields for
> > omiting common typos in queries.  It works quite good with dismax QParser
> > for queries longer than one word: "tv lc20", "sny psp 3001", "cannon 5d"
> > etc. For not having too many results I manipulated with `mm` parameter.
> But
> > when user type a single word like "ipad", "cannon". I always having a lot
> of
> > results (~60000). This is unacceptable for my client. He would like to
> have
> > then only the `good` results. That particulary match specific query. It's
> > hard to acomplish for me cause of use doublemetaphone field which
> converts
> > words like "apt", "opt" and "ipad" and even "ipod" to the same phonetic
> word
> > - APT. And then all of these  words are matched fairly the same gives me
> > huge amount of results. Similar problems I have with other words like
> > "canon", "canine" and "cannon" which are KNN in phonetic way. But
> lexically
> > have different meanings: "canon" - camera, "canine" - cat food , "cannon"
> -
> > may be a misspell for canon or part of book title about cannon weapons.
> >
> > My first idea was to make a second requestHandler without searching in
> > *_phonetic fields. And use it for queries with only one word. But it
> didn't
> > worked cause sometimes I want to correct user even if there is only one
> word
> > and suggest him something better. Query "cannon" is a good example. I'm
> > fairly sure that most of the time when someone type "cannon" it would be
> a
> > typo for "canon" and I want to show user also CANON cameras. That's why I
> > can't use second requestHandler for one word queries.
> >
> > I'm looking for any ideas how could I change my requestHandler.
> >
> > My regular queries are: http://localhost:8983/solr/select?q=cannon
> >
> > Below I put my configuration for requestHandler and schema.xml.
> >
> >
> >
> > solrconfig.xml:
> >
> > <requestHandler name="search" class="solr.SearchHandler" default="true">
> >   <lst name="defaults">
> > <str name="q.alt">*:*</str>
> >     <str name="defType">dismax</str>
> >     <str name="qf">
> >         title^1.3 title_text^0.9 title_phonetic^0.74 title_ng^0.17
> >         title_ngram^0.54
> >         producer_name^0.9 producer_name_text^0.89
> >         category_path_text^0.8 category_path_phonetic^0.65
> >         description^0.60 description_text^0.56
> >     </str>
> >     <str name="pf">title_text^1.1 title^1.2 description^0.3</str>
> >     <int name="ps">3</int>
> >     <str name="tie">0.1</str>
> >     <str name="mm">2&lt;100% 3&lt;-1 5&lt;85%</str>
> >
> >     <str name="fl">*,score</str>
> > </lst>
> > </requestHandler>
> >
> >
> > schema.xml:
> >
> > <?xml version="1.0" encoding="UTF-8" ?>
> > <schema name="XX" version="1.2">
> >    <types>
> >        <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
> > omitNorms="true" positionIncrementGap="0" />
> >    <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> > omitNorms="true" positionIncrementGap="0"/>
> >        <fieldType name="string" class="solr.StrField"
> > sortMissingLast="true" omitNorms="true" />
> >        <fieldType name="boolean" class="solr.BoolField"
> > sortMissingLast="true" omitNorms="true" />
> >        <fieldType name="decimal" class="solr.TrieFloatField"
> > precisionStep="2" omitNorms="true" positionIncrementGap="0" />
> >
> >        <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >            <analyzer>
> >                <charFilter class="solr.HTMLStripCharFilterFactory"/>
> >                <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >        <!-- Case insensitive stop word removal.
> >          add enablePositionIncrements=true in both the index and query
> >          analyzers to leave a 'gap' for more accurate phrase queries.
> >        -->
> >        <filter class="solr.StopFilterFactory"
> >                ignoreCase="true"
> >                                words="stopwords_pl.txt"
> >                enablePositionIncrements="true"
> >                />
> >        <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >
> >                <filter class="solr.LowerCaseFilterFactory" />
> >                <filter class="solr.TrimFilterFactory" />
> > <filter class="solr.StempelPolishStemFilterFactory" />
> >            </analyzer>
> >        </fieldType>
> >
> >    <fieldType name="text_gen" class="solr.TextField"
> > positionIncrementGap="100">
> >            <analyzer>
> >                <charFilter class="solr.HTMLStripCharFilterFactory"/>
> >                <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >        <filter class="solr.StopFilterFactory"
> >                ignoreCase="true"
> >                words="stopwords_pl.txt"
> >                enablePositionIncrements="true"
> >                />
> >        <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >
> >                <filter class="solr.LowerCaseFilterFactory" />
> >                <filter class="solr.TrimFilterFactory" />
> >            </analyzer>
> >        </fieldType>
> >
> >
> >    <fieldtype name="phonetic" stored="false" indexed="true"
> > class="solr.TextField" >
> >      <analyzer>
> >        <tokenizer class="solr.StandardTokenizerFactory"/>
> >        <filter class="solr.StopFilterFactory"
> >                ignoreCase="true"
> >                words="stopwords_pl.txt"
> >                enablePositionIncrements="true"
> >                />
> >        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"
> > maxCodeLength="8"/>
> >      </analyzer>
> >    </fieldtype>
> >
> >  <fieldtype name="ngram" class="solr.TextField">
> >   <analyzer type="index">
> >                <tokenizer class="solr.StandardTokenizerFactory"/>
> >      <filter class="solr.LowerCaseFilterFactory"/>
> >        <filter class="solr.StopFilterFactory"
> >                ignoreCase="true"
> >                words="stopwords_pl.txt"
> >                enablePositionIncrements="true"
> >                />
> >                <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >
> >                        <filter class="solr.NGramFilterFactory"
> > minGramSize="2" maxGramSize="3" />
> >                    </analyzer>
> >                    <analyzer type="query">
> >                <tokenizer class="solr.StandardTokenizerFactory"/>
> >      <filter class="solr.LowerCaseFilterFactory"/>
> >                        <filter class="solr.NGramFilterFactory"
> > minGramSize="2" maxGramSize="3" />
> >                    </analyzer>
> >                 </fieldtype>
> >
> > <fieldtype name="edgengram" class="solr.TextField">
> >   <analyzer>
> >                <tokenizer class="solr.StandardTokenizerFactory"/>
> >      <filter class="solr.LowerCaseFilterFactory"/>
> >        <filter class="solr.StopFilterFactory"
> >                ignoreCase="true"
> >                words="stopwords_pl.txt"
> >                enablePositionIncrements="true"
> >                />
> >         <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >
> >     <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> > maxGramSize="15" side="front"/>
> >
> >  </analyzer>
> >                 </fieldtype>
> >
> >
> >    </types>
> >    <fields>
> >        <field name="id" type="string" indexed="true" stored="true"
> > required="true" />
> >        <field name="title" type="text_gen" indexed="true" stored="true"
> > required="true" />
> >        <field name="category_path" type="string" indexed="true"
> > stored="true" />
> >
> >        <field name="producer_name" type="string" indexed="true"
> > stored="false" />
> >        <field name="description" type="text_gen" indexed="false"
> > stored="true" />
> >
> >  <dynamicField name="*_text" type="text" indexed="true" stored="false" />
> >
> >  <dynamicField name="*_ascii" type="text_ascii" indexed="true"
> > stored="false" />
> >  <dynamicField name="*_phonetic" type="phonetic" indexed="true"
> > stored="false" />
> >  <dynamicField name="*_ng" type="edgengram" indexed="true" stored="false"
> />
> >
> >  <dynamicField name="*_ngram" type="ngram" indexed="true" stored="false"
> />
> >
> >
> >    </fields>
> >    <uniqueKey>id</uniqueKey>
> >    <defaultSearchField>title</defaultSearchField>
> >    <solrQueryParser defaultOperator="AND" />
> >
> >    <copyField source="title" dest="title_sort" />
> >  <copyField source="title" dest="title_text" />
> > <copyField source="title" dest="title_ascii" />
> >    <copyField source="title" dest="title_phonetic" />
> >    <copyField source="title" dest="title_ng" />
> >    <copyField source="title" dest="title_ngram"/>
> >
> >  <copyField source="producer_name" dest="producer_name_text" />
> >  <copyField source="producer_name" dest="producer_name_phonetic" />
> >
> >    <copyField source="category_path" dest="category_path_text" />
> > <copyField source="category_path" dest="category_path_phonetic" />
> >   <copyField source="description" dest="description_text" />
> >
> > </schema>
> >
> >
> >
> >
> >
> > --
> > Rafał "RaVbaker" Piekarski.
> >
> > web: http://ja.ravbaker.net
> > mail: ravbaker@gmail.com
> > jid/xmpp/aim: ravbaker@gmail.com
> > mobile: +48-663-808-481
> >
>

Re: Too many results in dismax queries with one word

Posted by Erick Erickson <er...@gmail.com>.
The root problem here is "This is unacceptable for my client". The first
thing I'd suggest is that you work with your client and get them to define
what is acceptable. You'll be forever changing things (to no good purpose)
if all they can say is "that's not right".

For instance, you apparently have two competing requirements:
1> try to correct users input, which inevitably increases the results returned
2> narrow the search to the "right" results.

You can't have both every time!

So you could try something like going with a more-restrictive search
(no metaphone
comparison) first and, if the results returned weren't sufficient
firing the "broader" query
back, without showing the too-small results first.

You could work with your client and see if what they really want is
just the most relevant
results at the top of the list, in which case you can play with the
dismax field boosts
(by the way, what version of Solr are you using?)

You could work with the client to understand the user experience if
you use autocomplete
and/or faceting etc. to guide their explorations.

You could...

But none of that will help unless and until you and your client can
agree what is the
correct behavior ahead of time

Best
Erick

On Sat, Aug 20, 2011 at 11:04 AM, Rafał Piekarski (RaVbaker)
<ra...@gmail.com> wrote:
> Hi all,
>
> I have a database of e-commerce products (5M) and trying to build a search
> solution for it.
>
> I have used steemer, edgengram and doublemetaphone phonetic fields for
> omiting common typos in queries.  It works quite good with dismax QParser
> for queries longer than one word: "tv lc20", "sny psp 3001", "cannon 5d"
> etc. For not having too many results I manipulated with `mm` parameter. But
> when user type a single word like "ipad", "cannon". I always having a lot of
> results (~60000). This is unacceptable for my client. He would like to have
> then only the `good` results. That particulary match specific query. It's
> hard to acomplish for me cause of use doublemetaphone field which converts
> words like "apt", "opt" and "ipad" and even "ipod" to the same phonetic word
> - APT. And then all of these  words are matched fairly the same gives me
> huge amount of results. Similar problems I have with other words like
> "canon", "canine" and "cannon" which are KNN in phonetic way. But lexically
> have different meanings: "canon" - camera, "canine" - cat food , "cannon" -
> may be a misspell for canon or part of book title about cannon weapons.
>
> My first idea was to make a second requestHandler without searching in
> *_phonetic fields. And use it for queries with only one word. But it didn't
> worked cause sometimes I want to correct user even if there is only one word
> and suggest him something better. Query "cannon" is a good example. I'm
> fairly sure that most of the time when someone type "cannon" it would be a
> typo for "canon" and I want to show user also CANON cameras. That's why I
> can't use second requestHandler for one word queries.
>
> I'm looking for any ideas how could I change my requestHandler.
>
> My regular queries are: http://localhost:8983/solr/select?q=cannon
>
> Below I put my configuration for requestHandler and schema.xml.
>
>
>
> solrconfig.xml:
>
> <requestHandler name="search" class="solr.SearchHandler" default="true">
>   <lst name="defaults">
> <str name="q.alt">*:*</str>
>     <str name="defType">dismax</str>
>     <str name="qf">
>         title^1.3 title_text^0.9 title_phonetic^0.74 title_ng^0.17
>         title_ngram^0.54
>         producer_name^0.9 producer_name_text^0.89
>         category_path_text^0.8 category_path_phonetic^0.65
>         description^0.60 description_text^0.56
>     </str>
>     <str name="pf">title_text^1.1 title^1.2 description^0.3</str>
>     <int name="ps">3</int>
>     <str name="tie">0.1</str>
>     <str name="mm">2&lt;100% 3&lt;-1 5&lt;85%</str>
>
>     <str name="fl">*,score</str>
> </lst>
> </requestHandler>
>
>
> schema.xml:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="XX" version="1.2">
>    <types>
>        <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0" />
>    <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>        <fieldType name="string" class="solr.StrField"
> sortMissingLast="true" omitNorms="true" />
>        <fieldType name="boolean" class="solr.BoolField"
> sortMissingLast="true" omitNorms="true" />
>        <fieldType name="decimal" class="solr.TrieFloatField"
> precisionStep="2" omitNorms="true" positionIncrementGap="0" />
>
>        <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>            <analyzer>
>                <charFilter class="solr.HTMLStripCharFilterFactory"/>
>                <tokenizer class="solr.WhitespaceTokenizerFactory" />
>        <!-- Case insensitive stop word removal.
>          add enablePositionIncrements=true in both the index and query
>          analyzers to leave a 'gap' for more accurate phrase queries.
>        -->
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                                words="stopwords_pl.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>                <filter class="solr.LowerCaseFilterFactory" />
>                <filter class="solr.TrimFilterFactory" />
> <filter class="solr.StempelPolishStemFilterFactory" />
>            </analyzer>
>        </fieldType>
>
>    <fieldType name="text_gen" class="solr.TextField"
> positionIncrementGap="100">
>            <analyzer>
>                <charFilter class="solr.HTMLStripCharFilterFactory"/>
>                <tokenizer class="solr.WhitespaceTokenizerFactory" />
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords_pl.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>                <filter class="solr.LowerCaseFilterFactory" />
>                <filter class="solr.TrimFilterFactory" />
>            </analyzer>
>        </fieldType>
>
>
>    <fieldtype name="phonetic" stored="false" indexed="true"
> class="solr.TextField" >
>      <analyzer>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords_pl.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"
> maxCodeLength="8"/>
>      </analyzer>
>    </fieldtype>
>
>  <fieldtype name="ngram" class="solr.TextField">
>   <analyzer type="index">
>                <tokenizer class="solr.StandardTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords_pl.txt"
>                enablePositionIncrements="true"
>                />
>                <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>                        <filter class="solr.NGramFilterFactory"
> minGramSize="2" maxGramSize="3" />
>                    </analyzer>
>                    <analyzer type="query">
>                <tokenizer class="solr.StandardTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory"/>
>                        <filter class="solr.NGramFilterFactory"
> minGramSize="2" maxGramSize="3" />
>                    </analyzer>
>                 </fieldtype>
>
> <fieldtype name="edgengram" class="solr.TextField">
>   <analyzer>
>                <tokenizer class="solr.StandardTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords_pl.txt"
>                enablePositionIncrements="true"
>                />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>     <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="15" side="front"/>
>
>  </analyzer>
>                 </fieldtype>
>
>
>    </types>
>    <fields>
>        <field name="id" type="string" indexed="true" stored="true"
> required="true" />
>        <field name="title" type="text_gen" indexed="true" stored="true"
> required="true" />
>        <field name="category_path" type="string" indexed="true"
> stored="true" />
>
>        <field name="producer_name" type="string" indexed="true"
> stored="false" />
>        <field name="description" type="text_gen" indexed="false"
> stored="true" />
>
>  <dynamicField name="*_text" type="text" indexed="true" stored="false" />
>
>  <dynamicField name="*_ascii" type="text_ascii" indexed="true"
> stored="false" />
>  <dynamicField name="*_phonetic" type="phonetic" indexed="true"
> stored="false" />
>  <dynamicField name="*_ng" type="edgengram" indexed="true" stored="false" />
>
>  <dynamicField name="*_ngram" type="ngram" indexed="true" stored="false" />
>
>
>    </fields>
>    <uniqueKey>id</uniqueKey>
>    <defaultSearchField>title</defaultSearchField>
>    <solrQueryParser defaultOperator="AND" />
>
>    <copyField source="title" dest="title_sort" />
>  <copyField source="title" dest="title_text" />
> <copyField source="title" dest="title_ascii" />
>    <copyField source="title" dest="title_phonetic" />
>    <copyField source="title" dest="title_ng" />
>    <copyField source="title" dest="title_ngram"/>
>
>  <copyField source="producer_name" dest="producer_name_text" />
>  <copyField source="producer_name" dest="producer_name_phonetic" />
>
>    <copyField source="category_path" dest="category_path_text" />
> <copyField source="category_path" dest="category_path_phonetic" />
>   <copyField source="description" dest="description_text" />
>
> </schema>
>
>
>
>
>
> --
> Rafał "RaVbaker" Piekarski.
>
> web: http://ja.ravbaker.net
> mail: ravbaker@gmail.com
> jid/xmpp/aim: ravbaker@gmail.com
> mobile: +48-663-808-481
>