You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vincenzo D'Amore <v....@gmail.com> on 2016/06/01 23:13:57 UTC
StandardTokenizer behaviour with apostrophe and colon
Hi all,
StandardTokenizer don't split the text with an apostrophe (punctuation mark
' ) and with a colon (punctuation mark : ).
Just to be clear looking at documentation all punctation marks are
delimiters, with an exception for periods (dots), so I suppose that a pair
of Italian word like "nell'aria" should be split in two words "nell" and
"aria".
So I have bypassed the problem using a WordDelimiterFilterFactory.
Is this a bug or an undocumented behaviour? In any case, what to do next?
Best regards,
Vincenzo
--
Vincenzo D'Amore
email: v.damore@gmail.com
skype: free.dev
mobile: +39 349 8513251