You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by "kinow (via GitHub)" <gi...@apache.org> on 2023/02/26 21:23:08 UTC

[GitHub] [opennlp] kinow commented on pull request #506: OPENNLP-141 Tokenizers alphanumeric optimization only recognizes a-z as alpha chars

kinow commented on PR #506:
URL: https://github.com/apache/opennlp/pull/506#issuecomment-1445471012

   >For the records: I checked Italian, yet it seems they have no special characters in their alphabet.
   
   They should have some accents. I don't speak Italian but we had **a lot** of soap operas about Italian immigrants, and "più" always appeared in writing/speaking. Some Google Translate gives: “nor the most beautiful nor the most ugly” → “né il più bello né il più brutto” — https://it.wikipedia.org/wiki/Alfabeto_italiano
   
   > I thought of Spanish, yet could not find a valid(ated) pattern. Do you know of any proven es regex?
   
   Good question. I'm still studying Spanish, so good opportunity for me to learn more. Give me some time and I will find one (will spend some time searching & reading about alphabet/letters/etc.).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org