You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "James Dyer (JIRA)" <ji...@apache.org> on 2010/07/22 22:21:53 UTC

[jira] Created: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Improvements to SpellCheckComponent Collate functionality
---------------------------------------------------------

                 Key: SOLR-2010
                 URL: https://issues.apache.org/jira/browse/SOLR-2010
             Project: Solr
          Issue Type: New Feature
          Components: clients - java, spellchecker
    Affects Versions: 1.4.1
         Environment: Tested against trunk revision 966633
            Reporter: James Dyer
            Priority: Minor


Improvements to SpellCheckComponent Collate functionality

Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.

1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
2. Provide the option to get multiple collation suggestions
3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.

This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.

This patch adds the following spellcheck parameters:

1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).

2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.

3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):

<lst name="spellcheck">
	<lst name="suggestions">
		<lst name="hopq">
			<int name="numFound">94</int>
			<int name="startOffset">7</int>
			<int name="endOffset">11</int>
			<arr name="suggestion">
				<str>hope</str>
				<str>how</str>
				<str>hope</str>
				<str>chops</str>
				<str>hoped</str>
				etc
			</arr>
		<lst name="faill">
			<int name="numFound">100</int>
			<int name="startOffset">16</int>
			<int name="endOffset">21</int>
			<arr name="suggestion">
				<str>fall</str>
				<str>fails</str>
				<str>fail</str>
				<str>fill</str>
				<str>faith</str>
				<str>all</str>
				etc
			</arr>
		</lst>
		<lst name="collation">
			<str name="collationQuery">Title:(how AND fails)</str>
			<int name="hits">2</int>
			<lst name="misspellingsAndCorrections">
				<str name="hopq">how</str>
				<str name="faill">fails</str>
			</lst>
		</lst>
		<lst name="collation">
			<str name="collationQuery">Title:(hope AND faith)</str>
			<int name="hits">2</int>
			<lst name="misspellingsAndCorrections">
				<str name="hopq">hope</str>
				<str name="faill">faith</str>
			</lst>
		</lst>
		<lst name="collation">
			<str name="collationQuery">Title:(chops AND all)</str>
			<int name="hits">1</int>
			<lst name="misspellingsAndCorrections">
				<str name="hopq">chops</str>
				<str name="faill">all</str>
			</lst>
		</lst>
	</lst>
</lst>

In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.

This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Grant,

I saw your comment and I agree its probably best to somehow re-query
through a Search Handler, either the existing one with all other
components turned off, or through a new one just for this purpose.  If
you (or someone else) are not able to work on implementing it this way
then I can probably get a little time in a few weeks.   

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Friday, August 13, 2010 7:34 AM
To: dev@lucene.apache.org
Subject: Re: [jira] Commented: (SOLR-2010) Improvements to
SpellCheckComponent Collate functionality

Hi James,

Did you see my comments on the issue?  

On Aug 11, 2010, at 12:28 AM, Dyer, James wrote:

> Tom,
> 
> I'm going to also need this to work with 1.4.1 within the next month
or two so if someone else doesn't back-port it to 1.4.1 then I probably
will.  I also would like to see this working with shards.  The
PossibilityIterator class likely can be made a lot simpler.  If nobody
else takes care of these items I will try to find time to do so myself
prior to making it work with 1.4.1.
> 
> James Dyer
> E-Commerce Systems
> Ingram Book Company
> (615) 213-4311
> 
> -----Original Message-----
> From: Tom Phethean (JIRA) [mailto:jira@apache.org] 
> Sent: Tuesday, August 10, 2010 10:01 AM
> To: dev@lucene.apache.org
> Subject: [jira] Commented: (SOLR-2010) Improvements to
SpellCheckComponent Collate functionality
> 
> 
>    [
https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.
plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896903#
action_12896903 ] 
> 
> Tom Phethean commented on SOLR-2010:
> ------------------------------------
> 
> Ok, thanks. Do you know if there is a rough timescale on that?
> 
>> Improvements to SpellCheckComponent Collate functionality
>> ---------------------------------------------------------
>> 
>>                Key: SOLR-2010
>>                URL: https://issues.apache.org/jira/browse/SOLR-2010
>>            Project: Solr
>>         Issue Type: New Feature
>>         Components: clients - java, spellchecker
>>   Affects Versions: 1.4.1
>>        Environment: Tested against trunk revision 966633
>>           Reporter: James Dyer
>>           Assignee: Grant Ingersoll
>>           Priority: Minor
>>        Attachments: SOLR-2010.patch, SOLR-2010.patch
>> 
>> 
>> Improvements to SpellCheckComponent Collate functionality
>> Our project requires a better Spell Check Collator.  I'm contributing
this as a patch to get suggestions for improvements and in case there is
a broader need for these features.
>> 1. Only return collations that are guaranteed to result in hits if
re-queried (applying original fq params also).  This is especially
helpful when there is more than one correction per query.  The 1.4
behavior does not verify that a particular combination will actually
return hits.
>> 2. Provide the option to get multiple collation suggestions
>> 3. Provide extended collation results including the # of hits
re-querying will return and a breakdown of each misspelled word and its
correction.
>> This patch is similar to what is described in SOLR-507 item #1.
Also, this patch provides a viable workaround for the problem discussed
in SOLR-1074.  A dictionary could be created that combines the terms
from the multiple fields.  The collator then would prune out any
spurious suggestions this would cause.
>> This patch adds the following spellcheck parameters:
>> 1. spellcheck.maxCollationTries - maximum # of collation
possibilities to try before giving up.  Lower values ensure better
performance.  Higher values may be necessary to find a collation that
can return results.  Default is 0, which maintains backwards-compatible
behavior (do not check collations).
>> 2. spellcheck.maxCollations - maximum # of collations to return.
Default is 1, which maintains backwards-compatible behavior.
>> 3. spellcheck.collateExtendedResult - if true, returns an expanded
response format detailing collations found.  default is false, which
maintains backwards-compatible behavior.  When true, output is like this
(in context):
>> <lst name="spellcheck">
>> 	<lst name="suggestions">
>> 		<lst name="hopq">
>> 			<int name="numFound">94</int>
>> 			<int name="startOffset">7</int>
>> 			<int name="endOffset">11</int>
>> 			<arr name="suggestion">
>> 				<str>hope</str>
>> 				<str>how</str>
>> 				<str>hope</str>
>> 				<str>chops</str>
>> 				<str>hoped</str>
>> 				etc
>> 			</arr>
>> 		<lst name="faill">
>> 			<int name="numFound">100</int>
>> 			<int name="startOffset">16</int>
>> 			<int name="endOffset">21</int>
>> 			<arr name="suggestion">
>> 				<str>fall</str>
>> 				<str>fails</str>
>> 				<str>fail</str>
>> 				<str>fill</str>
>> 				<str>faith</str>
>> 				<str>all</str>
>> 				etc
>> 			</arr>
>> 		</lst>
>> 		<lst name="collation">
>> 			<str name="collationQuery">Title:(how AND
fails)</str>
>> 			<int name="hits">2</int>
>> 			<lst name="misspellingsAndCorrections">
>> 				<str name="hopq">how</str>
>> 				<str name="faill">fails</str>
>> 			</lst>
>> 		</lst>
>> 		<lst name="collation">
>> 			<str name="collationQuery">Title:(hope AND
faith)</str>
>> 			<int name="hits">2</int>
>> 			<lst name="misspellingsAndCorrections">
>> 				<str name="hopq">hope</str>
>> 				<str name="faill">faith</str>
>> 			</lst>
>> 		</lst>
>> 		<lst name="collation">
>> 			<str name="collationQuery">Title:(chops AND
all)</str>
>> 			<int name="hits">1</int>
>> 			<lst name="misspellingsAndCorrections">
>> 				<str name="hopq">chops</str>
>> 				<str name="faill">all</str>
>> 			</lst>
>> 		</lst>
>> 	</lst>
>> </lst>
>> In addition, SOLRJ is updated to include
SpellCheckResponse.getCollatedResults(), which will return the expanded
Collation format.  getCollatedResult(), which returns a single String,
is retained for backwards-compatibility.  Other APIs were not changed
but will still work provided that spellcheck.collateExtendedResult is
false.
>> This likely will not return valid results if using Shards.  Rather, a
more robust interaction with the index would be necessary than what
exists in SpellCheckCollator.collate().
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by Grant Ingersoll <gs...@apache.org>.
Hi James,

Did you see my comments on the issue?  

On Aug 11, 2010, at 12:28 AM, Dyer, James wrote:

> Tom,
> 
> I'm going to also need this to work with 1.4.1 within the next month or two so if someone else doesn't back-port it to 1.4.1 then I probably will.  I also would like to see this working with shards.  The PossibilityIterator class likely can be made a lot simpler.  If nobody else takes care of these items I will try to find time to do so myself prior to making it work with 1.4.1.
> 
> James Dyer
> E-Commerce Systems
> Ingram Book Company
> (615) 213-4311
> 
> -----Original Message-----
> From: Tom Phethean (JIRA) [mailto:jira@apache.org] 
> Sent: Tuesday, August 10, 2010 10:01 AM
> To: dev@lucene.apache.org
> Subject: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality
> 
> 
>    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896903#action_12896903 ] 
> 
> Tom Phethean commented on SOLR-2010:
> ------------------------------------
> 
> Ok, thanks. Do you know if there is a rough timescale on that?
> 
>> Improvements to SpellCheckComponent Collate functionality
>> ---------------------------------------------------------
>> 
>>                Key: SOLR-2010
>>                URL: https://issues.apache.org/jira/browse/SOLR-2010
>>            Project: Solr
>>         Issue Type: New Feature
>>         Components: clients - java, spellchecker
>>   Affects Versions: 1.4.1
>>        Environment: Tested against trunk revision 966633
>>           Reporter: James Dyer
>>           Assignee: Grant Ingersoll
>>           Priority: Minor
>>        Attachments: SOLR-2010.patch, SOLR-2010.patch
>> 
>> 
>> Improvements to SpellCheckComponent Collate functionality
>> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
>> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
>> 2. Provide the option to get multiple collation suggestions
>> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
>> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
>> This patch adds the following spellcheck parameters:
>> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
>> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
>> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
>> <lst name="spellcheck">
>> 	<lst name="suggestions">
>> 		<lst name="hopq">
>> 			<int name="numFound">94</int>
>> 			<int name="startOffset">7</int>
>> 			<int name="endOffset">11</int>
>> 			<arr name="suggestion">
>> 				<str>hope</str>
>> 				<str>how</str>
>> 				<str>hope</str>
>> 				<str>chops</str>
>> 				<str>hoped</str>
>> 				etc
>> 			</arr>
>> 		<lst name="faill">
>> 			<int name="numFound">100</int>
>> 			<int name="startOffset">16</int>
>> 			<int name="endOffset">21</int>
>> 			<arr name="suggestion">
>> 				<str>fall</str>
>> 				<str>fails</str>
>> 				<str>fail</str>
>> 				<str>fill</str>
>> 				<str>faith</str>
>> 				<str>all</str>
>> 				etc
>> 			</arr>
>> 		</lst>
>> 		<lst name="collation">
>> 			<str name="collationQuery">Title:(how AND fails)</str>
>> 			<int name="hits">2</int>
>> 			<lst name="misspellingsAndCorrections">
>> 				<str name="hopq">how</str>
>> 				<str name="faill">fails</str>
>> 			</lst>
>> 		</lst>
>> 		<lst name="collation">
>> 			<str name="collationQuery">Title:(hope AND faith)</str>
>> 			<int name="hits">2</int>
>> 			<lst name="misspellingsAndCorrections">
>> 				<str name="hopq">hope</str>
>> 				<str name="faill">faith</str>
>> 			</lst>
>> 		</lst>
>> 		<lst name="collation">
>> 			<str name="collationQuery">Title:(chops AND all)</str>
>> 			<int name="hits">1</int>
>> 			<lst name="misspellingsAndCorrections">
>> 				<str name="hopq">chops</str>
>> 				<str name="faill">all</str>
>> 			</lst>
>> 		</lst>
>> 	</lst>
>> </lst>
>> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
>> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Tom,

I'm going to also need this to work with 1.4.1 within the next month or two so if someone else doesn't back-port it to 1.4.1 then I probably will.  I also would like to see this working with shards.  The PossibilityIterator class likely can be made a lot simpler.  If nobody else takes care of these items I will try to find time to do so myself prior to making it work with 1.4.1.

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-----Original Message-----
From: Tom Phethean (JIRA) [mailto:jira@apache.org] 
Sent: Tuesday, August 10, 2010 10:01 AM
To: dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality


    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896903#action_12896903 ] 

Tom Phethean commented on SOLR-2010:
------------------------------------

Ok, thanks. Do you know if there is a rough timescale on that?

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Dyer, James" <Ja...@ingrambook.com>.
We were working with 2 versions of the functionality and decided that the one that doesn't require modifications to SearchHandler and ResponseBuilder would perform better in most sitations.  So you won't see any changes to these 2 classes.  The functionality should work.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: JAYABAALAN V (JIRA) [mailto:jira@apache.org] 
Sent: Friday, October 15, 2010 2:36 AM
To: dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality


    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921264#action_12921264 ] 

JAYABAALAN V commented on SOLR-2010:
------------------------------------

I am able to download only these four java class under the revision 1021439 SpellCheckComponent,SpellCheckResponse,SpellingParams,TestSpellCheckResponse and other java class are not updated ResponseBuilder.java, and SearchHandler.java

Let me know the correct path for these two java classes for revision 1021439


> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923463#action_12923463 ] 

James Dyer commented on SOLR-2010:
----------------------------------

Great.  Thank you.  

The 1.4 patch is mostly for my benefit, so we can use the functionaltiy before the next release.  Thought I'd share that with anyone else who wants to try it too...


> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: multiple_collations_as_an_array.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010.txt

Second version of patch.  Updated to trunk rev #986945.

Adds support for shards.  I originally implemented this by passing the SearchHandler to the SpellCheckComponent and then using an overloaded version of SearchHandler.handleRequestBody() to do the re-queries.  I found this was unnecessary as we get the same results by calling the QueryComponent directly.  

I added some test scenarios to "DistributedSpellCheckComponentTest" and all pass.  However, I am a bit disturbed to find that the test fails if I uncomment the constructor (added with this patch).  The constructor simply tells it to test only with 4 shards rather than trying 1 shard, then 2, etc.  I found either way the 4-shard test results in the same docs going to the same shards.  Yet the results are different.  Specifically the ranking/ordering of the collations returned and the # of hits reported are sometimes wrong when the constructor is called before the test.  Unfortunately I am at a loss as to why I get inconsistent results here and anyone's assistance on this would be most helpful. 

I also added an additional unit test method to verify this works when multiple request handlers are configured with different "qf" params.  I also added a unit test method that verifies this works when "fq" is set.



> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Tom Phethean (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896903#action_12896903 ] 

Tom Phethean commented on SOLR-2010:
------------------------------------

Ok, thanks. Do you know if there is a rough timescale on that?

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900246#action_12900246 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

{quote}Adds support for shards. I originally implemented this by passing the SearchHandler to the SpellCheckComponent and then using an overloaded version of SearchHandler.handleRequestBody() to do the re-queries. I found this was unnecessary as we get the same results by calling the QueryComponent directly. 
{quote}

I haven't taken a look at the patch yet, but by the sounds of it, I still think the cleaner way to go is to make Solr have an option to specifically pass in which component to run and turn off all others.  This would be useful for other things, too.  Then you could just use the existing mechanisms.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated SOLR-2010:
----------------------------------

    Attachment: SOLR-2010.patch

Added license headers

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010_141.patch

This version is for v1.4.1.  No shard support as SpellCheckComponent does not have any distributed support in 1.4.  All tests pass.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916158#action_12916158 ] 

James Dyer commented on SOLR-2010:
----------------------------------

The patch file SOLR-2010_141.patch should apply cleanly to v1.4.1.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922668#action_12922668 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

I already did the merge to 3.x.  I don't believe Yonik has backported his fix yet.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916543#action_12916543 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

Makes sense.  I'd say we stick w/ the recombine approach.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: multiple_collations_as_an_array.patch

Here's an attempt to implement Yonik's suggestion to have multiple collations returned as an Array rather than use repeated keys.  I am not familiar with JSON so I didn't realize the original format would cause problems.  

>From this perspective, however, I like the original version better.  The problem is in order to maintain backwards-compatibility, if "spellcheck.maxCollations" is unset or set to "1", then we need to return a single String with key "collation".  This patch alters the response only if "spellcheck.maxCollations" is >1, instead returning an array with key "collations".  

I also changed the distributed code and solrj to cope with the change in format.  All tests pass, but maybe someone will find a better solution than this, or perhaps we can leave it as is.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: multiple_collations_as_an_array.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921691#action_12921691 ] 

Yonik Seeley commented on SOLR-2010:
------------------------------------

This patch introduced some resource leaks - whenever it tries to verify the collated result by doing a search, it never closes the request and hence the searcher is never closed.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010_shardSearchHandler_999521.patch
                SOLR-2010_shardRecombineCollations_999521.patch

Both patch versions sync'ed to Trunk version 999521. (sorry about the many filename variants)

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "JAYABAALAN V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919800#action_12919800 ] 

JAYABAALAN V commented on SOLR-2010:
------------------------------------

What I mean here is the correct version of code is not updated the mainly three final string method are not updated under this path

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/common/org/apache/solr/common/params/SpellingParams.java.this path

that reason i am asking correct steps or correct source path for download purpose

Can you provide the details on Scheme and SolrConfiguration side also.It would be better i think so.



> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Tom Phethean (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896898#action_12896898 ] 

Tom Phethean commented on SOLR-2010:
------------------------------------

This sounds like a really useful patch, I would definitely like to see it go further as it would be useful for a project I'm currently working on. I have just tried to patch this against 1.4.1 (downloaded today) and got the following errors:

patching file solr/src/test/org/apache/solr/spelling/SpellPossibilityIteratorTest.java
patching file solr/src/test/org/apache/solr/spelling/SpellCheckCollatorTest.java
patching file solr/src/test/org/apache/solr/client/solrj/response/TestSpellCheckResponse.java
Hunk #1 FAILED at 20.
Hunk #2 FAILED at 103.
2 out of 2 hunks FAILED -- saving rejects to file solr/src/test/org/apache/solr/client/solrj/response/TestSpellCheckResponse.java.rej
patching file solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java
Hunk #1 FAILED at 132.
Hunk #2 FAILED at 361.
Hunk #3 FAILED at 405.
Hunk #4 FAILED at 452.
Hunk #5 FAILED at 466.
5 out of 5 hunks FAILED -- saving rejects to file solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java.rej
patching file solr/src/java/org/apache/solr/spelling/SpellCheckCollation.java
patching file solr/src/java/org/apache/solr/spelling/PossibilityIterator.java
patching file solr/src/java/org/apache/solr/spelling/SpellCheckCorrection.java
patching file solr/src/java/org/apache/solr/spelling/SpellCheckCollator.java
patching file solr/src/common/org/apache/solr/common/params/SpellingParams.java
Hunk #1 FAILED at 78.
1 out of 1 hunk FAILED -- saving rejects to file solr/src/common/org/apache/solr/common/params/SpellingParams.java.rej
patching file solr/src/solrj/org/apache/solr/client/solrj/response/SpellCheckResponse.java
Hunk #1 FAILED at 31.
Hunk #2 FAILED at 46.
Hunk #3 FAILED at 77.
Hunk #4 FAILED at 162.
4 out of 4 hunks FAILED -- saving rejects to file solr/src/solrj/org/apache/solr/client/solrj/response/SpellCheckResponse.java.rej


> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010.patch

New Patch Version with Shard Support.  Grant, I hope I'm getting closer to what you have in mind this time around.

I think I've figured how to send the collation test queries back to SearchHandler and have it take care of querying the shards individually.  Then the collation logic is no different for distributed / non-distributed.

As I would like to eventually use this in production here, any comments as to how to further make this a "production-quality" feature are much appreciated.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919851#action_12919851 ] 

James Dyer commented on SOLR-2010:
----------------------------------

Grant,

Is there anything else you'd like me to do with this?  Is this something you think should be committed?

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904264#action_12904264 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

Hi James,

First off, good work.  I like the overall design, etc.

Second, this patch no longer applies cleanly to trunk.  The issue is in the SearchHandler.

Third, in thinking some more about the whole distributed case, perhaps we are approaching this wrong.  I was originally thinking that we would have to go off and re-query all the shards (as in send another message) but we really shouldn't have to do that, right?  Why can't we just pass the collation request through to the shards as part of the get suggestions and then it can, if collation is asked for, return it's collation suggestions.  Then, the question becomes how to merge the suggestions and pick the best one.  This should save a round trip at the cost of doing some extra collations, but since most people aren't going to ask for more than 5 or 10, it shouldn't be an issue.

-Grant

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907304#action_12907304 ] 

James Dyer edited comment on SOLR-2010 at 9/8/10 12:50 PM:
-----------------------------------------------------------

Two new versions of the patch:

1. SOLR-2010_shardSearchHandler_993538.patch is the same as the 8/23/2010 version except it applies cleanly to trunk revision #993538.  In a Distributed setup, this version calls an overloaded method on SearchHandler to use its logic for combining results from the collation test queries.  This is simpler code but requires many more round-trips between shards.  We also can guarantee that a Distributed setup will always return the exact same collations in order as a non-Distributed setup.  

2. SOLR-2010_shardRecombineCollations_993538.patch is similar to the 8/19/2010 version, with improvements.  This version also applies cleanly to trunk revision #993538.  In a Distributed setup, each shard calls QueryComponent individually and generates its own list of Collations.  The SpellCheckComponent then combines and sorts the resulting collations, returning the best ones, up to the client-specified maximum.  This requires more complicated logic in SpellCheckComponent.finishStage(), although it does not necessitate changes to SearchHandler or ResponseBuilder.  It may be possible to find cases where a Distributed setup may return different collations--or the same collations in a different order--than a non-distributed setup.  I do not believe this potential disparity would ever be very significant.

Grant, I believe version 1 is something like what you were thinking of on 8/9 and 8/19.  Version 2 is more like what you describe in your comment from 8/30.  Let me know if you think this needs any more tweaking.  ALSO, if you're thinking of possibly committing this someday, you may want to look at SOLR-2083 also.  Based on my understanding, distributed SpellCheckComponent as exists currently in Trunk is broken.  (If I'm right), we may want to fix it before adding on more functionality.

      was (Author: jdyer):
    Two new versions of the patch:

1. SOLR-2010_shardSearchHandler_993538.patch is the same as the 8/23/2010 version except it applies cleanly to trunk revision #993538.  In a Distributed setup, this version calls an overloaded method on SearchHandler to use its logic for combining results from the collation test queries.  This is simpler code but requires many more round-trips between shards.  We also can guarantee that a Distributed setup will always return the exact same collations in order as a non-Distributed setup.  

2. SOLR-2010_shardRecombineCollations_993538.patch is similar to the 8/19/2010 version, with improvements.  This version also applies cleanly to trunk revision #993538.  In a Distributed setup, each shard calls QueryComponent individually and generates its own list of Collations.  The SpellCheckComponent then combines and sorts the resulting collations, returning the best ones, up to the client-specified maximum.  This requires more complicated logic in SpellCheckComponent.finishStage(), although it does not necessitate changes to SearchHandler or ResponseBuilder.  It may be possible to find cases where a Distributed setup may return different collations--or the same collations in a different order--than a non-distributed setup.  I do not believe this potential disparity would ever be very significant.

Grant, I believe version 1 is something like what you were thinking of on 8/9 and 8/19.  Version 2 is more like what you describe in your comment from 8/30.  Let me know if you think this needs any more tweaking.  ALSO, if you're thinking of possibly committing this someday, you may want to look at SOLR-2049 also.  Based on my understanding, distributed SpellCheckComponent as exists currently in Trunk is broken.  (If I'm right), we may want to fix it before adding on more functionality.
  
> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardSearchHandler_993538.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "JAYABAALAN V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919469#action_12919469 ] 

JAYABAALAN V commented on SOLR-2010:
------------------------------------

Thanks for your direction.

Based on your input i have tried in the truck and used the SOLR-2010_shardRecombineCollations_999521.patch for download.

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/common/org/apache/solr/common/params/SpellingParams.java.this path 

But there is no problem in the SpellingParams.java under this version.It looks not updated .Mainly three final string values like ""maxCollations","maxCollationTries", and collateExtendedResults are implemented and it looks Solr v1.3 in the history.

Do let me know the updated version path for downloading.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923004#action_12923004 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

James, you are right.  I mislabeled my merge.  Still getting used to this merge from trunk to branch stuff.  At any rate, no need for a patch, I will get the merged figured out soon.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919569#action_12919569 ] 

James Dyer commented on SOLR-2010:
----------------------------------

I tested "SOLR-2010_shardRecombineCollations_999521.patch" with current trunk and it still applies cleanly.  I'm not sure why SpellingParams.java isn't updating for you.  Perhaps try again?

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311




> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924397#action_12924397 ] 

Yonik Seeley commented on SOLR-2010:
------------------------------------

Still stuff messed up with the merge props I guess - when I try to merge in Robert's fixes, it does nothing (it thinks they are already merged).
I guess I just need to copy the file at this point.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: multiple_collations_as_an_array.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916533#action_12916533 ] 

James Dyer commented on SOLR-2010:
----------------------------------

Grant,

It wouldn't be difficult to create an uber-patch that allows users to pick which way to go. If that's the route you want to go then I'd be happy to do that. However, I think it would be best to stick with the "recombine" approach because although you'll get throw-away collations, it will always be done internally within the shard. The performance penalty in most cases will be slight. On the other hand, if using the "Search Handler" approach, it has to query over the network for each *try*, which could be significant. I wouldn't say that you would never benefit from the "Search Handler" option, but I wonder if it warrants extra lines of code and making changes to the SearchHandler class, etc.

Unfortunately I haven't done any performance testing with these. We only are in early development here with SOLR and I don't have access to multiple servers with which I can easily deploy such a test. On a non-distributed setup this patch only adds a little bit of overhead, and I wouldn't expect the "recombine" option to be much worse than that.

Note that with either approach I'd imagine you'd frequently run into the case where some/many shards simply do not have the documents the user is looking for and they will have to query up to "collationMaxTries" to come up empty. In which case the shard(s) that get the results may need to wait for the shards that are busy querying away in vain...

Let me know if you want an "uber-patch". I might have a little time later today if you let me know.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919289#action_12919289 ] 

James Dyer commented on SOLR-2010:
----------------------------------

If you want to use v1.4.1 you can either get the GA source and then apply the patch or check out the 1.4.x branch at http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 then apply the patch file (attached to this JIRA).  The correct patch for 1.4.1 is: SOLR-2010_141.patch.  

If you want to use the Trunk Version then check out the source at http://svn.apache.org/repos/asf/lucene/dev/trunk.  The correct patch file for Trunk is: SOLR-2010_shardRecombineCollations_999521.patch

There are instructions on applying patches in the wiki:  http://wiki.apache.org/solr/HowToContribute#Working_With_Patches

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919966#action_12919966 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

Committed to trunk on revision 1021439.  Working on backporting to 3.x

James, can you add docs to the wiki for the new parameters?

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "JAYABAALAN V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916132#action_12916132 ] 

JAYABAALAN V commented on SOLR-2010:
------------------------------------

I am using Apache Solr 1.4 and need to download this patch for my implementation purpose.Let me know the latest rev# 

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896899#action_12896899 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

The patch is currently for trunk.  I think it will likely be the case that we work it out for trunk and then backport.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921774#action_12921774 ] 

Yonik Seeley commented on SOLR-2010:
------------------------------------

Is this the patch that added multiple collations?

I'm seeing stuff like this:
{code}
  {...
      "collation":"lowerfilt:(+faith +hope +loaves)",
      "collation":"lowerfilt:(+faith +hope +love)"}}}
{code}

And we should probably change collation to be an array instead.  Repeated keys in JSON are legal ,but not nice to deal with.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921720#action_12921720 ] 

Yonik Seeley commented on SOLR-2010:
------------------------------------

fyi, I just committed a fix for the resource leak, in addition to a couple other simplifications.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923446#action_12923446 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

James, I did the merge back to 3.x.  I don't think we will be backporting this to 1.4, since all future releases there are bug-fix only.  

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: multiple_collations_as_an_array.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010_shardSearchHandler_993538.patch
                SOLR-2010_shardRecombineCollations_993538.patch

Two new versions of the patch:

1. SOLR-2010_shardSearchHandler_993538.patch is the same as the 8/23/2010 version except it applies cleanly to trunk revision #993538.  In a Distributed setup, this version calls an overloaded method on SearchHandler to use its logic for combining results from the collation test queries.  This is simpler code but requires many more round-trips between shards.  We also can guarantee that a Distributed setup will always return the exact same collations in order as a non-Distributed setup.  

2. SOLR-2010_shardRecombineCollations_993538.patch is similar to the 8/19/2010 version, with improvements.  This version also applies cleanly to trunk revision #993538.  In a Distributed setup, each shard calls QueryComponent individually and generates its own list of Collations.  The SpellCheckComponent then combines and sorts the resulting collations, returning the best ones, up to the client-specified maximum.  This requires more complicated logic in SpellCheckComponent.finishStage(), although it does not necessitate changes to SearchHandler or ResponseBuilder.  It may be possible to find cases where a Distributed setup may return different collations--or the same collations in a different order--than a non-distributed setup.  I do not believe this potential disparity would ever be very significant.

Grant, I believe version 1 is something like what you were thinking of on 8/9 and 8/19.  Version 2 is more like what you describe in your comment from 8/30.  Let me know if you think this needs any more tweaking.  ALSO, if you're thinking of possibly committing this someday, you may want to look at SOLR-2049 also.  Based on my understanding, distributed SpellCheckComponent as exists currently in Trunk is broken.  (If I'm right), we may want to fix it before adding on more functionality.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardSearchHandler_993538.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "JAYABAALAN V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921264#action_12921264 ] 

JAYABAALAN V commented on SOLR-2010:
------------------------------------

I am able to download only these four java class under the revision 1021439 SpellCheckComponent,SpellCheckResponse,SpellingParams,TestSpellCheckResponse and other java class are not updated ResponseBuilder.java, and SearchHandler.java

Let me know the correct path for these two java classes for revision 1021439


> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924391#action_12924391 ] 

Yonik Seeley edited comment on SOLR-2010 at 10/24/10 6:53 PM:
--------------------------------------------------------------

bq. James, I did the merge back to 3.x.

FYI, you missed Robert's resource leak fixes to SpellCheckCollatorTest.
Not sure what best practice is to catch stuff like this... if it's only a file or two, I guess check the history of each?

edit: actually your backport to 3x didn't even touch SpellCheckCollatorTest.  I was misled by the fact that when you look at the history of SpellCheckCollatorTest, it shows an update.  But I guess it was just merge properties.  Ugh.

{noformat}
yonik@WOLVERINE /cygdrive/c/code/lusolr_3x
$ svn log ./solr/src/test/org/apache/solr/spelling/SpellCheckCollatorTest.java
------------------------------------------------------------------------
r1026000 | gsingers | 2010-10-21 09:48:34 -0400 (Thu, 21 Oct 2010) | 1 line

SOLR-2010, including Yonik's fix, SOLR-2181 -- hope I did this merge correctly
------------------------------------------------------------------------
r1021439 | gsingers | 2010-10-11 13:32:11 -0400 (Mon, 11 Oct 2010) | 1 line

SOLR-2010: added richer support for spell checking collations
------------------------------------------------------------------------

yonik@WOLVERINE /cygdrive/c/code/lusolr_3x
$ svn diff -r 1021439:1026000 ./solr/src/test/org/apache/solr/spelling/SpellCheckCollatorTest.java                                                   
yonik@WOLVERINE /cygdrive/c/code/lusolr_3x
{noformat}

I'm in the process of getting branch_3x to pass the searcher open/close test, so I'll handle this.

      was (Author: yseeley@gmail.com):
    bq. James, I did the merge back to 3.x.

FYI, you missed Robert's resource leak fixes to SpellCheckCollatorTest.
Not sure what best practice is to catch stuff like this... if it's only a file or two, I guess check the history of each?

I'm in the process of getting branch_3x to pass the searcher open/close test, so I'll handle this.
  
> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: multiple_collations_as_an_array.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916492#action_12916492 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

James,

For the two diff. approaches, did you do any testing to get a sense of which performs better?  It seems to me that the recombine one, while overfetching some, would likely be faster overall b/c it avoids all the extra shard communication.  Of course, it may be the case that for some setups, one works better than the other.  i.e. small sharded systems can afford the second call, while large systems should avoid the second fan-out/in.  Which, of course, makes me wonder how hard it would be to give people both and let them specify based on an input parameter or by having two different components derived off of SpellCheckComponent?   Thoughts?   In that approach, we could have SCC be just for single node instances and then the other two inherit from it to provide users the choice of distributed approaches.  Since you have the code for both already, what do you think?

Otherwise, I've looked at the recombine approach and it seems pretty solid from a "ready to commit" standpoint.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919992#action_12919992 ] 

James Dyer commented on SOLR-2010:
----------------------------------

Wiki is updated.  I marked it all as available in 3.1 / 4.0  .  correct?

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "JAYABAALAN V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919469#action_12919469 ] 

JAYABAALAN V edited comment on SOLR-2010 at 10/9/10 4:00 AM:
-------------------------------------------------------------

Thanks for your direction.

Based on your input i have tried in the truck and used the SOLR-2010_shardRecombineCollations_999521.patch for download.

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/common/org/apache/solr/common/params/SpellingParams.java.this path 

But there is a problem in the SpellingParams.java under this version.It looks not updated correctly in this version.Mainly three final string values like ""maxCollations","maxCollationTries", and collateExtendedResults are implemented and it  Solr v1.3 in the history.

Do let me know the updated version path for downloading.

      was (Author: vjayabaalan):
    Thanks for your direction.

Based on your input i have tried in the truck and used the SOLR-2010_shardRecombineCollations_999521.patch for download.

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/common/org/apache/solr/common/params/SpellingParams.java.this path 

But there is no problem in the SpellingParams.java under this version.It looks not updated .Mainly three final string values like ""maxCollations","maxCollationTries", and collateExtendedResults are implemented and it looks Solr v1.3 in the history.

Do let me know the updated version path for downloading.
  
> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010.patch

Third version (with ".patch" extension.  I had used ".txt" extension with 2nd version).  Works with trunk rev#986945.

This time SpellCheckCollator calls the SearchHandler instead of calling the QueryComponent.  This required exposing a reference to the SearchHandler on the ResponseBuilder.  Also a new overloaded method in SearchHandler.processRequestBody() lets you override the list of components to run.  In this case we just have it run QueryComponent.

This revision has 2 potential benefits: 
 
(1) the overloaded method in SearchHandler may prove useful to other components in the future.  

(2) there may be a way to get SearchHandler to requery all the shards at once and then there would be no need to reintegrate the Collations in SearchHandler.finishStage().  However, see my comment in SpellCheckCollator lines 56-57.  Likely I am calling SpellCheckCollator during the wrong "stage" of the distributed request but I a need to find out more specifically how shards work to determine how to further improve this here.  As time allows I will do my own investigating but anyone's advice would be greatly appreciated.

Finally, this version corrects a bug that would have caused one of the test scenarios in DistributedSpellCheckComponentTest to fail.  Unfortunately in the 2nd version, I had left some scenarios commented-out and did not catch this until now.


> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924391#action_12924391 ] 

Yonik Seeley commented on SOLR-2010:
------------------------------------

bq. James, I did the merge back to 3.x.

FYI, you missed Robert's resource leak fixes to SpellCheckCollatorTest.
Not sure what best practice is to catch stuff like this... if it's only a file or two, I guess check the history of each?

I'm in the process of getting branch_3x to pass the searcher open/close test, so I'll handle this.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: multiple_collations_as_an_array.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010.patch

Tested against branch version #96633

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922997#action_12922997 ] 

James Dyer commented on SOLR-2010:
----------------------------------

Maybe I'm looking at the wrong place.  I checked out the 3.x branch at: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x .  Is that correct?

If you drill into ../solr/src/java/org/apache/solr/spelling/ from there you won't find the added source files from this case (SpellCheckCollator.java, etc...)

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896585#action_12896585 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

James, thanks for the patch.  At first glance this looks great and I would like to see it incorporated.

bq. This likely will not return valid results if using Shards. Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

Perhaps we should just have a simple Search Handler that is QueryComp only, either that or we need a way to easily turn off all components but the query component.  That way, we could take advantage of the existing sharding capabilities.



> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Assigned: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned SOLR-2010:
-------------------------------------

    Assignee: Grant Ingersoll

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: solr_2010_3x.patch

Here is a patch for the 3.x branch.  This includes Yonik's fix to close the searcher (thanks!).  All tests pass.

Grant, do you feel this is something that can safely go into the 3.x branch in addition to Trunk?

(by the way, I am looking into Yonik's suggestion to change multiple collation results into an Array.  The trick here, I think, is to not break backwards-compatibility...)

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "JAYABAALAN V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919166#action_12919166 ] 

JAYABAALAN V commented on SOLR-2010:
------------------------------------

Let me know the correct path for downloding the patch for SpellChecking based on your discuss.It very discult for the identify the correct patch

I am try to download things from the following truck

https://svn.apache.org/repos/asf/lucene/dev/trunk/

But the new modified code are not present in the this truck for download.do provide any pointer or clear steps for download this patch for spellchecking.



> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913858#action_12913858 ] 

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

I haven't forgotten about this, James.  I appreciate the hard work, but I'm swamped at the moment.  I'll try to get you feedback soon.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Posted by "James Dyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010_141.patch

update the 1.4.1 patch to include Yonik's fix.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org