You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2015/11/04 02:31:27 UTC

[jira] [Updated] (SOLR-8057) Change default Sim to BM25 (w/backcompat config handling)

     [ https://issues.apache.org/jira/browse/SOLR-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-8057:
---------------------------
    Attachment: SOLR-8057.patch



*NOTE:* Forgot to mention last time, but the previous patch required the following svn copy before it can be applied...

{code}
svn cp solr/core/src/java/org/apache/solr/search/similarities/DefaultSimilarityFactory.java solr/core/src/java/org/apache/solr/search/similarities/ClassicSimilarityFactory.java
{code}

----

Updated patch focusing on fixing test failures & improving coverage...

* straightforward test fixes/improvements to account for new defaults (mostly related to brittle score assumptions)...
** TestNonDefinedSimilarityFactory
** TestExtendedDismaxParser
** StatsComponentTest
** QueryElevationComponentTest
** TestReRankQParserPlugin
** TestGroupingSearch
* cloned TestPerFieldSimilarity as TestPerFieldSimilarityClassic
** TestPerFieldSimilarityClassic set's older luceneMatchVersion
** TestPerFieldSimilarity updated to account for new BM25 defaults
* cloned TestDefaultSimilarityFactory as TestDefaultSimilarityFactoryClassic
** TestDefaultSimilarityFactoryClassic set's older luceneMatchVersion
** TestDefaultSimilarityFactory updated to account for new BM25 defaults
** both of these tests currently trip an assert in DefaultSimilarityFactory because apparently nothing is calling SolreCoreAware.inform(SolrCore) on any per-fieldtype SimilarityFactories that implement SolreCoreAware
* added some logging to ChangedSchemaMergeTest.testOptimizeDiffSchemas to try and make sense of it's failure

NOTE: To apply this new patch, you'll first need to copy/move the following files...

{code}
svn cp solr/core/src/java/org/apache/solr/search/similarities/DefaultSimilarityFactory.java solr/core/src/java/org/apache/solr/search/similarities/ClassicSimilarityFactory.java
svn cp solr/core/src/test/org/apache/solr/search/similarities/TestPerFieldSimilarity.java solr/core/src/test/org/apache/solr/search/similarities/TestPerFieldSimilarityClassic.java
svn cp solr/core/src/test/org/apache/solr/search/similarities/TestDefaultSimilarityFactory.java solr/core/src/test/org/apache/solr/search/similarities/TestDefaultSimilarityFactoryClassic.java
svn mv solr/core/src/test-files/solr/collection1/conf/schema-tfidf.xml solr/core/src/test-files/solr/collection1/conf/schema-sim-default.xml
{code}

Tests still failing with this patch:

* BadIndexSchemaTest.testPerFieldtypeSimButNoSchemaSimFactory
** see previous comment: The javadocs say that "IndexSchema will provide such error checking if a non-SchemaAware instance of SimilarityFactory" but as soon as i made DefaultSimilarityFactory implement SolrCoreAware (NOT SchemaAware) this seems to have broken
** seems like a tangentially related bug uncovered by this change.
* TestDefaultSimilarityFactoryClassic + TestDefaultSimilarityFactory
** Both of these tests currently trip an asssert in DefaultSimilarityFactory because aparently nothing is calling SolreCoreAware.inform(SolrCore) on any per-fieldtype SimilarityFactories that implement SolreCoreAware
** bug appears independent of these changes -- any schema specifying a per-fieldtype similarity that is SolrCoreAware should have same problem
* ChangedSchemaMergeTest.testOptimizeDiffSchemas + TestCloudSchemaless (threadleak due to core reload failures)
** something about the IndexSchemaFactory.buildIndexSchema + SolrCore.setLatestSchema code path isn't properly calling SolrCoreAware.inform(SolrCore) on the default similarity
** bug appears independent of these changes -- i'm pretty sure any schema specifying a similarity that is SolrCoreAware should have same problem



> Change default Sim to BM25 (w/backcompat config handling)
> ---------------------------------------------------------
>
>                 Key: SOLR-8057
>                 URL: https://issues.apache.org/jira/browse/SOLR-8057
>             Project: Solr
>          Issue Type: Task
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>            Priority: Blocker
>             Fix For: Trunk
>
>         Attachments: SOLR-8057.patch, SOLR-8057.patch
>
>
> LUCENE-6789 changed the default similarity for IndexSearcher to BM25 and renamed "DefaultSimilarity" to "ClassicSimilarity"
> Solr needs to be updated accordingly:
> * a "ClassicSimilarityFactory" should exist w/expected behavior/javadocs
> * default behavior (in 6.0) when no similarity is specified in configs should (ultimately) use BM25 depending on luceneMatchVersion
> ** either by assuming BM25SimilarityFactory or by changing the internal behavior of DefaultSimilarityFactory
> * comments in sample configs need updated to reflect new default behavior
> * ref guide needs updated anywhere it mentions/implies that a particular similarity is used (or implies TF-IDF is used by default)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org