You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jon Harper (JIRA)" <ji...@apache.org> on 2015/09/24 12:51:08 UTC
[jira] [Commented] (LUCENE-3929)
org.apache.lucene.analysis.fr.FrenchAnalyzer could introduce french accent
insensitive search.
[ https://issues.apache.org/jira/browse/LUCENE-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906190#comment-14906190 ]
Jon Harper commented on LUCENE-3929:
------------------------------------
But what about the many words with less than 5 letters ?
Here's an example list of almost 400 words that can not be searched without the correct accents:
For me, this makes it unuseable. But feel free to reclose this issue if the current behavior is the expected one.
{noformat}
{
"à"
"abbé"
"aéra"
"aère"
"aéré"
"âgé"
"âgée"
"âgés"
"aidé"
"ailé"
"aimé"
"aîné"
"aisé"
"aléa"
"allé"
"allô"
"armé"
"axé"
"axée"
"axés"
"basé"
"bâté"
"bavé"
"béat"
"bébé"
"bée"
"béer"
"bées"
"béni"
"bêta"
"bête"
"blé"
"blés"
"boxé"
"buée"
"buté"
"çà"
"café"
"calé"
"casé"
"céda"
"cède"
"cédé"
"cène"
"cèpe"
"ciré"
"cité"
"clé"
"clés"
"clôt"
"codé"
"cône"
"côte"
"côté"
"coté"
"créa"
"crée"
"créé"
"crié"
"cubé"
"curé"
"cuvé"
"daté"
"dé"
"déc."
"deçà"
"déçu"
"défi"
"déjà"
"delà"
"déni"
"dép."
"dès"
"dés"
"dîné"
"dôme"
"dopé"
"doré"
"dosé"
"doté"
"doué"
"dupé"
"duré"
"ébat"
"écho"
"échu"
"écot"
"écru"
"écu"
"écus"
"édam"
"édit"
"égal"
"élan"
"élis"
"élit"
"élu"
"élue"
"élus"
"élut"
"ème"
"émet"
"émir"
"émis"
"émit"
"émoi"
"ému"
"émue"
"émus"
"émut"
"épée"
"épi"
"épia"
"épie"
"épié"
"épis"
"ère"
"ères"
"erré"
"ès"
"étai"
"étal"
"état"
"étau"
"été"
"êtes"
"étés"
"être"
"étui"
"famé"
"fané"
"fée"
"fées"
"fêla"
"fêle"
"fêlé"
"féru"
"fêta"
"fête"
"fêté"
"fétu"
"fève"
"fié"
"fiée"
"fiés"
"figé"
"filé"
"fixé"
"foré"
"fumé"
"fusé"
"futé"
"gagé"
"garé"
"gâté"
"gavé"
"gazé"
"gèle"
"gelé"
"gémi"
"gêna"
"gène"
"gêne"
"gêné"
"géra"
"gère"
"géré"
"gobé"
"gré"
"grès"
"gué"
"gués"
"hâlé"
"hâté"
"héla"
"hèle"
"hélé"
"hère"
"hôte"
"hué"
"huée"
"hués"
"humé"
"idée"
"inné"
"iodé"
"jasé"
"jeté"
"joué"
"jubé"
"jugé"
"juré"
"képi"
"là"
"lacé"
"lamé"
"lavé"
"lès"
"lésa"
"lèse"
"lésé"
"lève"
"levé"
"lié"
"liée"
"liés"
"limé"
"logé"
"loué"
"luné"
"luxé"
"méat"
"mêla"
"mêle"
"mêlé"
"même"
"mène"
"mené"
"mère"
"mimé"
"miné"
"miré"
"misé"
"mité"
"mixé"
"môle"
"mué"
"muée"
"mués"
"muré"
"muté"
"nagé"
"né"
"née"
"nées"
"néon"
"nés"
"névé"
"nié"
"niée"
"niés"
"nô"
"nôs"
"noté"
"noué"
"noyé"
"nuée"
"ô"
"obéi"
"opté"
"orée"
"orné"
"osé"
"osée"
"osés"
"ôta"
"ôte"
"ôté"
"ôtée"
"ôter"
"ôtes"
"ôtés"
"ôtez"
"pâmé"
"pané"
"paré"
"pâté"
"pavé"
"payé"
"pèle"
"pelé"
"pêne"
"père"
"péri"
"pèse"
"pesé"
"pète"
"pété"
"pilé"
"pisé"
"plié"
"pôle"
"posé"
"pré"
"près"
"prés"
"prêt"
"prié"
"pué"
"racé"
"ragé"
"râlé"
"ramé"
"râpé"
"rasé"
"raté"
"rayé"
"ré"
"réel"
"réer"
"réf."
"régi"
"rêne"
"rêva"
"rêve"
"rêvé"
"ridé"
"rimé"
"rivé"
"rôda"
"rôde"
"rôdé"
"rodé"
"rôle"
"rosé"
"rôt"
"rôti"
"rôts"
"roué"
"rué"
"ruée"
"rués"
"rusé"
"saké"
"salé"
"sapé"
"scié"
"sème"
"semé"
"sève"
"sévi"
"skié"
"sucé"
"sué"
"suée"
"sués"
"talé"
"tapé"
"taré"
"tâté"
"taxé"
"tél."
"télé"
"ténu"
"tête"
"têtu"
"thé"
"thés"
"tiré"
"tôle"
"tôt"
"très"
"trié"
"tué"
"tuée"
"tués"
"typé"
"urée"
"urgé"
"usé"
"usée"
"usés"
"vécu"
"vélo"
"vêt"
"vête"
"vêts"
"vêtu"
"vexé"
"vidé"
"viré"
"visé"
"volé"
"voté"
"voué"
"zébu"
"zèle"
"zélé"
"zéro"
"zoné"
}
{noformat}
> org.apache.lucene.analysis.fr.FrenchAnalyzer could introduce french accent insensitive search.
> ----------------------------------------------------------------------------------------------
>
> Key: LUCENE-3929
> URL: https://issues.apache.org/jira/browse/LUCENE-3929
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: Geoffroy Schneck
> Assignee: Robert Muir
> Priority: Minor
> Fix For: 3.6, 4.0-ALPHA
>
>
> The GermanAnalyzer does the same with the Umlaut for example. Searching for 'gehort' will return 'gehört' and 'gehort' .
> I expected that the FrenchAnalyzer would also return 'sécuritaires' and 'securitaires' and searching for any of them, but it's not the case
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org