You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Christopher Condit <co...@sdsc.edu> on 2013/09/13 00:04:53 UTC

Stop filter changes in Solr >= 4.4

While attempting to upgrade from Solr 4.3.0 to Solr 4.4.0 I ran into
this exception:

java.lang.IllegalArgumentException: enablePositionIncrements=false is
not supported anymore as of Lucene 4.4 as it can create broken token
streams

which led me to https://issues.apache.org/jira/browse/LUCENE-4963. I
need to be able to match queries irrespective of intervening stopwords
(which used to work with enablePositionIncrements="true"). For
instance: "foo of the bar" would find documents matching "foo bar",
"foo of bar", and "foo of the bar". With this option deprecated in
4.4.0 I'm not clear on how to maintain the same functionality.

The package javadoc adds:

If the selected analyzer filters the stop words "is" and "the", then
for a document containing the string "blue is the sky", only the
tokens "blue", "sky" are indexed, with position("sky") = 3 +
position("blue"). Now, a phrase query "blue is the sky" would find
that document, because the same analyzer filters the same stop words
from that query. But the phrase query "blue sky" would not find that
document because the position increment between "blue" and "sky" is
only 1.

If this behavior does not fit the application needs, the query parser
needs to be configured to not take position increments into account
when generating phrase queries.

But there's no mention of how to actually configure the query parser
to do this. Does anyone know how to deal with this issue as Solr moves
toward 5.0?

Crossposted from stackoverflow:
http://stackoverflow.com/questions/18668376/solr-4-4-stopfilterfactory-and-enablepositionincrements

Re: Stop filter changes in Solr >= 4.4

Posted by Yonik Seeley <yo...@lucidworks.com>.

On Fri, Sep 13, 2013 at 1:07 AM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> AFAIk, enablePositionIncrements=false is deprecated in 4.x but not
> removed. It will be removed in 5.0 though.

Hmmm, I had missed that.

Anyone have pointers to an example of what "broken" means and why it
can't be fixed?
It seems pretty extreme just to remove this functionality that has
been possible OOTB for 10 years.

-Yonik
http://lucidworks.com

Re: Stop filter changes in Solr >= 4.4

Posted by Christopher Condit <co...@sdsc.edu>.

Here's the field definition:
<fieldType name="textLoose" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt" enablePositionIncrements="false" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" />
<filter class="solr.PorterStemFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="false"
words="stopwords_en.txt" enablePositionIncrements="false" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" />
<filter class="solr.PorterStemFilterFactory" />
</analyzer>
</fieldType>

Here's the stack trace:
WARNING: org.apache.solr.client.solrj.SolrServerException:
java.lang.IllegalArgumentException: enablePositionIncrements=false is
not supported anymore as of Lucene 4.4 as it can create broken token
streams
org.apache.solr.client.solrj.SolrServerException:
org.apache.solr.client.solrj.SolrServerException:
java.lang.IllegalArgumentException: enablePositionIncrements=false is
not supported anymore as of Lucene 4.4 as it can create broken token
streams
at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.addBean(SolrServer.java:136)
at org.apache.solr.client.solrj.SolrServer.addBean(SolrServer.java:125)
at edu.sdsc.nif.vocabulary.VocabularySolrImpl.addTerm(VocabularySolrImpl.java:67)
at edu.sdsc.nif.vocabulary.VocabularySolrImplTest.testGetTermFromIdAndProvider(VocabularySolrImplTest.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.lang.IllegalArgumentException: enablePositionIncrements=false is
not supported anymore as of Lucene 4.4 as it can create broken token
streams
at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155)
... 32 more
Caused by: java.lang.IllegalArgumentException:
enablePositionIncrements=false is not supported anymore as of Lucene
4.4 as it can create broken token streams
at org.apache.lucene.analysis.util.FilteringTokenFilter.checkPositionIncrement(FilteringTokenFilter.java:40)
at org.apache.lucene.analysis.util.FilteringTokenFilter.setEnablePositionIncrements(FilteringTokenFilter.java:140)
at org.apache.lucene.analysis.core.StopFilterFactory.create(StopFilterFactory.java:88)
at org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:67)
at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:66)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:177)
at org.apache.lucene.document.Field.tokenStream(Field.java:552)
at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95)
at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:212)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:582)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:94)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
... 32 more

On Thu, Sep 12, 2013 at 10:07 PM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> Can we see a full stack trace for that IllegalArgumentException?
> AFAIk, enablePositionIncrements=false is deprecated in 4.x but not
> removed. It will be removed in 5.0 though.
>
> On Fri, Sep 13, 2013 at 3:34 AM, Christopher Condit <co...@sdsc.edu> wrote:
>> While attempting to upgrade from Solr 4.3.0 to Solr 4.4.0 I ran into
>> this exception:
>>
>>  java.lang.IllegalArgumentException: enablePositionIncrements=false is
>> not supported anymore as of Lucene 4.4 as it can create broken token
>> streams
>>
>> which led me to https://issues.apache.org/jira/browse/LUCENE-4963.  I
>> need to be able to match queries irrespective of intervening stopwords
>> (which used to work with enablePositionIncrements="true"). For
>> instance: "foo of the bar" would find documents matching "foo bar",
>> "foo of bar", and "foo of the bar". With this option deprecated in
>> 4.4.0 I'm not clear on how to maintain the same functionality.
>>
>> The package javadoc adds:
>>
>> If the selected analyzer filters the stop words "is" and "the", then
>> for a document containing the string "blue is the sky", only the
>> tokens "blue", "sky" are indexed, with position("sky") = 3 +
>> position("blue"). Now, a phrase query "blue is the sky" would find
>> that document, because the same analyzer filters the same stop words
>> from that query. But the phrase query "blue sky" would not find that
>> document because the position increment between "blue" and "sky" is
>> only 1.
>>
>> If this behavior does not fit the application needs, the query parser
>> needs to be configured to not take position increments into account
>> when generating phrase queries.
>>
>> But there's no mention of how to actually configure the query parser
>> to do this. Does anyone know how to deal with this issue as Solr moves
>> toward 5.0?
>>
>> Crossposted from stackoverflow:
>> http://stackoverflow.com/questions/18668376/solr-4-4-stopfilterfactory-and-enablepositionincrements
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

Re: Stop filter changes in Solr >= 4.4

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

Can we see a full stack trace for that IllegalArgumentException?
AFAIk, enablePositionIncrements=false is deprecated in 4.x but not
removed. It will be removed in 5.0 though.

On Fri, Sep 13, 2013 at 3:34 AM, Christopher Condit <co...@sdsc.edu> wrote:
> While attempting to upgrade from Solr 4.3.0 to Solr 4.4.0 I ran into
> this exception:
>
>  java.lang.IllegalArgumentException: enablePositionIncrements=false is
> not supported anymore as of Lucene 4.4 as it can create broken token
> streams
>
> which led me to https://issues.apache.org/jira/browse/LUCENE-4963.  I
> need to be able to match queries irrespective of intervening stopwords
> (which used to work with enablePositionIncrements="true"). For
> instance: "foo of the bar" would find documents matching "foo bar",
> "foo of bar", and "foo of the bar". With this option deprecated in
> 4.4.0 I'm not clear on how to maintain the same functionality.
>
> The package javadoc adds:
>
> If the selected analyzer filters the stop words "is" and "the", then
> for a document containing the string "blue is the sky", only the
> tokens "blue", "sky" are indexed, with position("sky") = 3 +
> position("blue"). Now, a phrase query "blue is the sky" would find
> that document, because the same analyzer filters the same stop words
> from that query. But the phrase query "blue sky" would not find that
> document because the position increment between "blue" and "sky" is
> only 1.
>
> If this behavior does not fit the application needs, the query parser
> needs to be configured to not take position increments into account
> when generating phrase queries.
>
> But there's no mention of how to actually configure the query parser
> to do this. Does anyone know how to deal with this issue as Solr moves
> toward 5.0?
>
> Crossposted from stackoverflow:
> http://stackoverflow.com/questions/18668376/solr-4-4-stopfilterfactory-and-enablepositionincrements



-- 
Regards,
Shalin Shekhar Mangar.