You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "S.L" <si...@gmail.com> on 2014/05/24 23:21:24 UTC

Wordbreak spellchecker excessive breaking.

I am using Solr wordbreak spellchecker and the issue is that when I search
for a term like "mob ile" expecting that the wordbreak spellchecker would
actually resutn a suggestion for "mobile" it breaks the search term into
letters like "m o b"  I have two issues with this behavior.

 1. How can I make Solr combine "mob ile" to mobile?
 2. Not withstanding the fact that my search term "mob ile" is being broken
incorrectly into individual letters , I realize that the wordbreak is
needed in certain cases, how do I control the wordbreak so that it does not
break it into letters like "m o b" which seems like excessive breaking to
me ?

Thanks.

Re: Wordbreak spellchecker excessive breaking.

Posted by "S.L" <si...@gmail.com>.
Anyone ?


On Sat, May 24, 2014 at 5:21 PM, S.L <si...@gmail.com> wrote:

>
> I am using Solr wordbreak spellchecker and the issue is that when I search
> for a term like "mob ile" expecting that the wordbreak spellchecker would
> actually resutn a suggestion for "mobile" it breaks the search term into
> letters like "m o b"  I have two issues with this behavior.
>
>  1. How can I make Solr combine "mob ile" to mobile?
>  2. Not withstanding the fact that my search term "mob ile" is being
> broken incorrectly into individual letters , I realize that the wordbreak
> is needed in certain cases, how do I control the wordbreak so that it does
> not break it into letters like "m o b" which seems like excessive breaking
> to me ?
>
> Thanks.
>
>

Re: Wordbreak spellchecker excessive breaking.

Posted by "S.L" <si...@gmail.com>.
James,

Thanks , there is no error in the logs, it just that I do not get the start
up message in the log.

I do not see any warm up related configuration for any spell checker in my
solrconfig.xml , I have also pasted the auto warm related configuration
data below .

<query>
    <!-- Max Boolean Clauses

         Maximum number of clauses in each BooleanQuery,  an exception
         is thrown if exceeded.

         ** WARNING **

         This option actually modifies a global Lucene property that
         will affect all SolrCores.  If multiple solrconfig.xml files
         disagree on this property, the value at any given moment will
         be based on the last SolrCore to be initialized.

      -->
    <maxBooleanClauses>1024</maxBooleanClauses>


    <!-- Solr Internal Query Caches

         There are two implementations of cache available for Solr,
         LRUCache, based on a synchronized LinkedHashMap, and
         FastLRUCache, based on a ConcurrentHashMap.

         FastLRUCache has faster gets and slower puts in single
         threaded operation and thus is generally faster than LRUCache
         when the hit ratio of the cache is high (> 75%), and may be
         faster under other scenarios on multi-cpu systems.
    -->

    <!-- Filter Cache

         Cache used by SolrIndexSearcher for filters (DocSets),
         unordered sets of *all* documents that match a query.  When a
         new searcher is opened, its caches may be prepopulated or
         "autowarmed" using data from caches in the old searcher.
         autowarmCount is the number of items to prepopulate.  For
         LRUCache, the autowarmed items will be the most recently
         accessed items.

         Parameters:
           class - the SolrCache implementation LRUCache or
               (LRUCache or FastLRUCache)
           size - the maximum number of entries in the cache
           initialSize - the initial capacity (number of entries) of
               the cache.  (see java.util.HashMap)
           autowarmCount - the number of entries to prepopulate from
               and old cache.
      -->
    <filterCache class="solr.FastLRUCache"
                 size="512"
                 initialSize="512"
                 autowarmCount="0"/>

    <!-- Query Result Cache

         Caches results of searches - ordered lists of document ids
         (DocList) based on a query, a sort, and the range of documents
requested.
      -->
    <queryResultCache class="solr.LRUCache"
                     size="512"
                     initialSize="512"
                     autowarmCount="0"/>

    <!-- Document Cache

         Caches Lucene Document objects (the stored fields for each
         document).  Since Lucene internal document ids are transient,
         this cache will not be autowarmed.
      -->
    <documentCache class="solr.LRUCache"
                   size="512"
                   initialSize="512"
                   autowarmCount="0"/>

    <!-- Field Value Cache

         Cache used to hold field values that are quickly accessible
         by document id.  The fieldValueCache is created by default
         even if not configured here.
      -->
    <!--
       <fieldValueCache class="solr.FastLRUCache"
                        size="512"
                        autowarmCount="128"
                        showItems="32" />
      -->

    <!-- Custom Cache

         Example of a generic cache.  These caches may be accessed by
         name through SolrIndexSearcher.getCache(),cacheLookup(), and
         cacheInsert().  The purpose is to enable easy caching of
         user/application level data.  The regenerator argument should
         be specified as an implementation of solr.CacheRegenerator
         if autowarming is desired.
      -->
    <!--
       <cache name="myUserCache"
              class="solr.LRUCache"
              size="4096"
              initialSize="1024"
              autowarmCount="1024"
              regenerator="com.mycompany.MyRegenerator"
              />
      -->


    <!-- Lazy Field Loading

         If true, stored fields that are not requested will be loaded
         lazily.  This can result in a significant speed improvement
         if the usual case is to not load all stored fields,
         especially if the skipped fields are large compressed text
         fields.
    -->
    <enableLazyFieldLoading>true</enableLazyFieldLoading>

   <!-- Use Filter For Sorted Query

        A possible optimization that attempts to use a filter to
        satisfy a search.  If the requested sort does not include
        score, then the filterCache will be checked for a filter
        matching the query. If found, the filter will be used as the
        source of document ids, and then the sort will be applied to
        that.

        For most situations, this will not be useful unless you
        frequently get the same search repeatedly with different sort
        options, and none of them ever use "score"
     -->
   <!--
      <useFilterForSortedQuery>true</useFilterForSortedQuery>
     -->

   <!-- Result Window Size

        An optimization for use with the queryResultCache.  When a search
        is requested, a superset of the requested number of document ids
        are collected.  For example, if a search for a particular query
        requests matching documents 10 through 19, and queryWindowSize is
50,
        then documents 0 through 49 will be collected and cached.  Any
further
        requests in that range can be satisfied via the cache.
     -->
   <queryResultWindowSize>20</queryResultWindowSize>

   <!-- Maximum number of documents to cache for any entry in the
        queryResultCache.
     -->
   <queryResultMaxDocsCached>200</queryResultMaxDocsCached>

   <!-- Query Related Event Listeners

        Various IndexSearcher related events can trigger Listeners to
        take actions.

        newSearcher - fired whenever a new searcher is being prepared
        and there is a current searcher handling requests (aka
        registered).  It can be used to prime certain caches to
        prevent long request times for certain requests.

        firstSearcher - fired whenever a new searcher is being
        prepared but there is no current registered searcher to handle
        requests or to gain autowarming data from.


     -->
    <!-- QuerySenderListener takes an array of NamedList and executes a
         local query request for each NamedList in sequence.
      -->
    <listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <!--
           <lst><str name="q">solr</str><str name="sort">price
asc</str></lst>
           <lst><str name="q">rocks</str><str name="sort">weight
asc</str></lst>
          -->
      </arr>
    </listener>
    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst>
          <str name="q">static firstSearcher warming in solrconfig.xml</str>
        </lst>
      </arr>
    </listener>

    <!-- Use Cold Searcher

         If a search request comes in and there is no current
         registered searcher, then immediately register the still
         warming searcher and use it.  If "false" then all requests
         will block until the first searcher is done warming.
      -->
    <useColdSearcher>false</useColdSearcher>

    <!-- Max Warming Searchers

         Maximum number of searchers that may be warming in the
         background concurrently.  An error is returned if this limit
         is exceeded.

         Recommend values of 1-2 for read-only slaves, higher for
         masters w/o cache warming.
      -->
    <maxWarmingSearchers>2</maxWarmingSearchers>

  </query>



On Fri, May 30, 2014 at 10:20 AM, Dyer, James <Ja...@ingramcontent.com>
wrote:

> I am not sure why changing spellcheck parameters would prevent your server
> from restarting.  One thing to check is to see if you have warming queries
> running that involve spellcheck.  I think I remember from long ago there
> was (maybe still is) an obscure bug where sometimes it will lock up in rare
> cases when spellcheck is used in warming queries.  I do not remember
> exactly what caused this or if it was ever fixed.
>
> Besides that, you might want to post a stack trace or describe what
> happens when it doesn't restart.  Perhaps someone here will know what the
> problem is.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: S.L [mailto:simpleliving016@gmail.com]
> Sent: Friday, May 30, 2014 12:36 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Wordbreak spellchecker excessive breaking.
>
> James,
>
> Thanks for clearly stating this , I was not able to find this documented
> anywhere, yes I am using it with another spell checker (Direct) with the
> collation on. I will try the maxChangtes and let you know.
>
> On a side note , whenever I change the spellchecker parameter , I need to
> rebuild the index  and delete the solr data directory before that  as my
> Tomcat instance would not even start, can you let me know why ?
>
> Thanks.
>
>
>
>
> On Tue, May 27, 2014 at 12:21 PM, Dyer, James <
> James.Dyer@ingramcontent.com>
> wrote:
>
> > You can do this if you set it up like in the mail Solr example:
> >
> > <lst name="spellchecker">
> >         <str name="name">wordbreak</str>
> >         <str name="classname">solr.WordBreakSolrSpellChecker</str>
> >         <str name="field">name</str>
> >         <str name="combineWords">true</str>
> >         <str name="breakWords">true</str>
> >         <int name="maxChanges">10</int>
> > </lst>
> >
> > The "combineWords" and "breakWords" flags let you tell it which kind of
> > workbreak correction you want.  "maxChanges" controls the maximum number
> of
> > words it can break 1 word into, or the maximum number of words it can
> > combine.  It is reasonable to set this to 1 or 2.
> >
> > The best way to use this is in conjunction with a "regular" spellchecker
> > like DirectSolrSpellChecker.  When used together with the collation
> > functionality, it should take a query like "mob ile" and depending on
> what
> > actually returns results from your data, suggest either "mobile" or
> perhaps
> > "mob lie" or both.  The one thing is cannot do is fix a transposition or
> > misspelling and combine or break words in one shot.  That is, it cannot
> > detect that "mob lie" should become "mobile".
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: S.L [mailto:simpleliving016@gmail.com]
> > Sent: Saturday, May 24, 2014 4:21 PM
> > To: solr-user@lucene.apache.org
> > Subject: Wordbreak spellchecker excessive breaking.
> >
> > I am using Solr wordbreak spellchecker and the issue is that when I
> search
> > for a term like "mob ile" expecting that the wordbreak spellchecker would
> > actually resutn a suggestion for "mobile" it breaks the search term into
> > letters like "m o b"  I have two issues with this behavior.
> >
> >  1. How can I make Solr combine "mob ile" to mobile?
> >  2. Not withstanding the fact that my search term "mob ile" is being
> broken
> > incorrectly into individual letters , I realize that the wordbreak is
> > needed in certain cases, how do I control the wordbreak so that it does
> not
> > break it into letters like "m o b" which seems like excessive breaking to
> > me ?
> >
> > Thanks.
> >
>

RE: Wordbreak spellchecker excessive breaking.

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
I am not sure why changing spellcheck parameters would prevent your server from restarting.  One thing to check is to see if you have warming queries running that involve spellcheck.  I think I remember from long ago there was (maybe still is) an obscure bug where sometimes it will lock up in rare cases when spellcheck is used in warming queries.  I do not remember exactly what caused this or if it was ever fixed.

Besides that, you might want to post a stack trace or describe what happens when it doesn't restart.  Perhaps someone here will know what the problem is.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: S.L [mailto:simpleliving016@gmail.com] 
Sent: Friday, May 30, 2014 12:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Wordbreak spellchecker excessive breaking.

James,

Thanks for clearly stating this , I was not able to find this documented
anywhere, yes I am using it with another spell checker (Direct) with the
collation on. I will try the maxChangtes and let you know.

On a side note , whenever I change the spellchecker parameter , I need to
rebuild the index  and delete the solr data directory before that  as my
Tomcat instance would not even start, can you let me know why ?

Thanks.




On Tue, May 27, 2014 at 12:21 PM, Dyer, James <Ja...@ingramcontent.com>
wrote:

> You can do this if you set it up like in the mail Solr example:
>
> <lst name="spellchecker">
>         <str name="name">wordbreak</str>
>         <str name="classname">solr.WordBreakSolrSpellChecker</str>
>         <str name="field">name</str>
>         <str name="combineWords">true</str>
>         <str name="breakWords">true</str>
>         <int name="maxChanges">10</int>
> </lst>
>
> The "combineWords" and "breakWords" flags let you tell it which kind of
> workbreak correction you want.  "maxChanges" controls the maximum number of
> words it can break 1 word into, or the maximum number of words it can
> combine.  It is reasonable to set this to 1 or 2.
>
> The best way to use this is in conjunction with a "regular" spellchecker
> like DirectSolrSpellChecker.  When used together with the collation
> functionality, it should take a query like "mob ile" and depending on what
> actually returns results from your data, suggest either "mobile" or perhaps
> "mob lie" or both.  The one thing is cannot do is fix a transposition or
> misspelling and combine or break words in one shot.  That is, it cannot
> detect that "mob lie" should become "mobile".
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: S.L [mailto:simpleliving016@gmail.com]
> Sent: Saturday, May 24, 2014 4:21 PM
> To: solr-user@lucene.apache.org
> Subject: Wordbreak spellchecker excessive breaking.
>
> I am using Solr wordbreak spellchecker and the issue is that when I search
> for a term like "mob ile" expecting that the wordbreak spellchecker would
> actually resutn a suggestion for "mobile" it breaks the search term into
> letters like "m o b"  I have two issues with this behavior.
>
>  1. How can I make Solr combine "mob ile" to mobile?
>  2. Not withstanding the fact that my search term "mob ile" is being broken
> incorrectly into individual letters , I realize that the wordbreak is
> needed in certain cases, how do I control the wordbreak so that it does not
> break it into letters like "m o b" which seems like excessive breaking to
> me ?
>
> Thanks.
>

Re: Wordbreak spellchecker excessive breaking.

Posted by "S.L" <si...@gmail.com>.
James,

Thanks for clearly stating this , I was not able to find this documented
anywhere, yes I am using it with another spell checker (Direct) with the
collation on. I will try the maxChangtes and let you know.

On a side note , whenever I change the spellchecker parameter , I need to
rebuild the index  and delete the solr data directory before that  as my
Tomcat instance would not even start, can you let me know why ?

Thanks.




On Tue, May 27, 2014 at 12:21 PM, Dyer, James <Ja...@ingramcontent.com>
wrote:

> You can do this if you set it up like in the mail Solr example:
>
> <lst name="spellchecker">
>         <str name="name">wordbreak</str>
>         <str name="classname">solr.WordBreakSolrSpellChecker</str>
>         <str name="field">name</str>
>         <str name="combineWords">true</str>
>         <str name="breakWords">true</str>
>         <int name="maxChanges">10</int>
> </lst>
>
> The "combineWords" and "breakWords" flags let you tell it which kind of
> workbreak correction you want.  "maxChanges" controls the maximum number of
> words it can break 1 word into, or the maximum number of words it can
> combine.  It is reasonable to set this to 1 or 2.
>
> The best way to use this is in conjunction with a "regular" spellchecker
> like DirectSolrSpellChecker.  When used together with the collation
> functionality, it should take a query like "mob ile" and depending on what
> actually returns results from your data, suggest either "mobile" or perhaps
> "mob lie" or both.  The one thing is cannot do is fix a transposition or
> misspelling and combine or break words in one shot.  That is, it cannot
> detect that "mob lie" should become "mobile".
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: S.L [mailto:simpleliving016@gmail.com]
> Sent: Saturday, May 24, 2014 4:21 PM
> To: solr-user@lucene.apache.org
> Subject: Wordbreak spellchecker excessive breaking.
>
> I am using Solr wordbreak spellchecker and the issue is that when I search
> for a term like "mob ile" expecting that the wordbreak spellchecker would
> actually resutn a suggestion for "mobile" it breaks the search term into
> letters like "m o b"  I have two issues with this behavior.
>
>  1. How can I make Solr combine "mob ile" to mobile?
>  2. Not withstanding the fact that my search term "mob ile" is being broken
> incorrectly into individual letters , I realize that the wordbreak is
> needed in certain cases, how do I control the wordbreak so that it does not
> break it into letters like "m o b" which seems like excessive breaking to
> me ?
>
> Thanks.
>

RE: Wordbreak spellchecker excessive breaking.

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
You can do this if you set it up like in the mail Solr example:

<lst name="spellchecker">
	<str name="name">wordbreak</str>
	<str name="classname">solr.WordBreakSolrSpellChecker</str>      
	<str name="field">name</str>
	<str name="combineWords">true</str>
	<str name="breakWords">true</str>
	<int name="maxChanges">10</int>
</lst>

The "combineWords" and "breakWords" flags let you tell it which kind of workbreak correction you want.  "maxChanges" controls the maximum number of words it can break 1 word into, or the maximum number of words it can combine.  It is reasonable to set this to 1 or 2.

The best way to use this is in conjunction with a "regular" spellchecker like DirectSolrSpellChecker.  When used together with the collation functionality, it should take a query like "mob ile" and depending on what actually returns results from your data, suggest either "mobile" or perhaps "mob lie" or both.  The one thing is cannot do is fix a transposition or misspelling and combine or break words in one shot.  That is, it cannot detect that "mob lie" should become "mobile".

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: S.L [mailto:simpleliving016@gmail.com] 
Sent: Saturday, May 24, 2014 4:21 PM
To: solr-user@lucene.apache.org
Subject: Wordbreak spellchecker excessive breaking.

I am using Solr wordbreak spellchecker and the issue is that when I search
for a term like "mob ile" expecting that the wordbreak spellchecker would
actually resutn a suggestion for "mobile" it breaks the search term into
letters like "m o b"  I have two issues with this behavior.

 1. How can I make Solr combine "mob ile" to mobile?
 2. Not withstanding the fact that my search term "mob ile" is being broken
incorrectly into individual letters , I realize that the wordbreak is
needed in certain cases, how do I control the wordbreak so that it does not
break it into letters like "m o b" which seems like excessive breaking to
me ?

Thanks.