You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Diego Oliveira (JIRA)" <ji...@apache.org> on 2016/11/05 01:35:59 UTC

[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"

    [ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15638299#comment-15638299 ] 

Diego Oliveira commented on SOLR-6468:
--------------------------------------

I read all discussion and can't believe on this decision. I'm having the same problem!!! I need to use stopword filter + shingle filter. But when removed the stop words I stay with a hole that create a bug for shingle filters... they duplicate tokens that cannot be removed by Remove Duplicate Filter due to shingle and tokens be in distinct initial positions. I don't believe that the community cannot solve this problem enabling this old feature... as the people said in here. It is best stay with the 'simplified' version than with the new (plus) version. it will until Solr X? Come on!!!

> Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"
> --------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6468
>                 URL: https://issues.apache.org/jira/browse/SOLR-6468
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.8.1, 4.9
>            Reporter: Alexander S.
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
> <fieldType name="words_ngram" class="solr.TextField" omitNorms="false" autoGeneratePhraseQueries="true">
>   <analyzer>
>     <tokenizer class="solr.PatternTokenizerFactory" pattern="[^\w]+" />
>     <filter class="solr.StopFilterFactory" words="url_stopwords.txt" ignoreCase="true" />
>     <filter class="solr.LowerCaseFilterFactory" />
>   </analyzer>
> </fieldType>
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org