You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "rmuir (via GitHub)" <gi...@apache.org> on 2023/02/28 19:35:10 UTC

[GitHub] [lucene] rmuir commented on pull request #12172: Add Romanian stopwords with s&t with comma

rmuir commented on PR #12172:
URL: https://github.com/apache/lucene/pull/12172#issuecomment-1448749282

   Note, if we fix it here, stemmer maybe should deal with this case too? https://github.com/snowballstem/snowball/blob/master/algorithms/romanian.sbl#L26-L27
   
   Alternatively, tokenfilter could be added that "normalizes/folds" these and runs before stopfilter and stemfilter to take care of it. It would have the advantage of giving the user choice (they can just create customanalzer and remove the normalization if they don't want that folding). But it would be overkill, if we should really just fix stopwords and stemmer.  Sorry, I'm not knowledgeable on Romanian, so I don't know if it causes problems to treat them "the same". 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org