You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jon Harper (JIRA)" <ji...@apache.org> on 2015/09/24 12:51:08 UTC

[jira] [Commented] (LUCENE-3929) org.apache.lucene.analysis.fr.FrenchAnalyzer could introduce french accent insensitive search.

    [ https://issues.apache.org/jira/browse/LUCENE-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906190#comment-14906190 ] 

Jon Harper commented on LUCENE-3929:
------------------------------------

But what about the many words with less than 5 letters ?
Here's an example list of almost 400 words that can not be searched without the correct accents:
For me, this makes it unuseable. But feel free to reclose this issue if the current behavior is the expected one.
{noformat}
{
    "à"
    "abbé"
    "aéra"
    "aère"
    "aéré"
    "âgé"
    "âgée"
    "âgés"
    "aidé"
    "ailé"
    "aimé"
    "aîné"
    "aisé"
    "aléa"
    "allé"
    "allô"
    "armé"
    "axé"
    "axée"
    "axés"
    "basé"
    "bâté"
    "bavé"
    "béat"
    "bébé"
    "bée"
    "béer"
    "bées"
    "béni"
    "bêta"
    "bête"
    "blé"
    "blés"
    "boxé"
    "buée"
    "buté"
    "çà"
    "café"
    "calé"
    "casé"
    "céda"
    "cède"
    "cédé"
    "cène"
    "cèpe"
    "ciré"
    "cité"
    "clé"
    "clés"
    "clôt"
    "codé"
    "cône"
    "côte"
    "côté"
    "coté"
    "créa"
    "crée"
    "créé"
    "crié"
    "cubé"
    "curé"
    "cuvé"
    "daté"
    "dé"
    "déc."
    "deçà"
    "déçu"
    "défi"
    "déjà"
    "delà"
    "déni"
    "dép."
    "dès"
    "dés"
    "dîné"
    "dôme"
    "dopé"
    "doré"
    "dosé"
    "doté"
    "doué"
    "dupé"
    "duré"
    "ébat"
    "écho"
    "échu"
    "écot"
    "écru"
    "écu"
    "écus"
    "édam"
    "édit"
    "égal"
    "élan"
    "élis"
    "élit"
    "élu"
    "élue"
    "élus"
    "élut"
    "ème"
    "émet"
    "émir"
    "émis"
    "émit"
    "émoi"
    "ému"
    "émue"
    "émus"
    "émut"
    "épée"
    "épi"
    "épia"
    "épie"
    "épié"
    "épis"
    "ère"
    "ères"
    "erré"
    "ès"
    "étai"
    "étal"
    "état"
    "étau"
    "été"
    "êtes"
    "étés"
    "être"
    "étui"
    "famé"
    "fané"
    "fée"
    "fées"
    "fêla"
    "fêle"
    "fêlé"
    "féru"
    "fêta"
    "fête"
    "fêté"
    "fétu"
    "fève"
    "fié"
    "fiée"
    "fiés"
    "figé"
    "filé"
    "fixé"
    "foré"
    "fumé"
    "fusé"
    "futé"
    "gagé"
    "garé"
    "gâté"
    "gavé"
    "gazé"
    "gèle"
    "gelé"
    "gémi"
    "gêna"
    "gène"
    "gêne"
    "gêné"
    "géra"
    "gère"
    "géré"
    "gobé"
    "gré"
    "grès"
    "gué"
    "gués"
    "hâlé"
    "hâté"
    "héla"
    "hèle"
    "hélé"
    "hère"
    "hôte"
    "hué"
    "huée"
    "hués"
    "humé"
    "idée"
    "inné"
    "iodé"
    "jasé"
    "jeté"
    "joué"
    "jubé"
    "jugé"
    "juré"
    "képi"
    "là"
    "lacé"
    "lamé"
    "lavé"
    "lès"
    "lésa"
    "lèse"
    "lésé"
    "lève"
    "levé"
    "lié"
    "liée"
    "liés"
    "limé"
    "logé"
    "loué"
    "luné"
    "luxé"
    "méat"
    "mêla"
    "mêle"
    "mêlé"
    "même"
    "mène"
    "mené"
    "mère"
    "mimé"
    "miné"
    "miré"
    "misé"
    "mité"
    "mixé"
    "môle"
    "mué"
    "muée"
    "mués"
    "muré"
    "muté"
    "nagé"
    "né"
    "née"
    "nées"
    "néon"
    "nés"
    "névé"
    "nié"
    "niée"
    "niés"
    "nô"
    "nôs"
    "noté"
    "noué"
    "noyé"
    "nuée"
    "ô"
    "obéi"
    "opté"
    "orée"
    "orné"
    "osé"
    "osée"
    "osés"
    "ôta"
    "ôte"
    "ôté"
    "ôtée"
    "ôter"
    "ôtes"
    "ôtés"
    "ôtez"
    "pâmé"
    "pané"
    "paré"
    "pâté"
    "pavé"
    "payé"
    "pèle"
    "pelé"
    "pêne"
    "père"
    "péri"
    "pèse"
    "pesé"
    "pète"
    "pété"
    "pilé"
    "pisé"
    "plié"
    "pôle"
    "posé"
    "pré"
    "près"
    "prés"
    "prêt"
    "prié"
    "pué"
    "racé"
    "ragé"
    "râlé"
    "ramé"
    "râpé"
    "rasé"
    "raté"
    "rayé"
    "ré"
    "réel"
    "réer"
    "réf."
    "régi"
    "rêne"
    "rêva"
    "rêve"
    "rêvé"
    "ridé"
    "rimé"
    "rivé"
    "rôda"
    "rôde"
    "rôdé"
    "rodé"
    "rôle"
    "rosé"
    "rôt"
    "rôti"
    "rôts"
    "roué"
    "rué"
    "ruée"
    "rués"
    "rusé"
    "saké"
    "salé"
    "sapé"
    "scié"
    "sème"
    "semé"
    "sève"
    "sévi"
    "skié"
    "sucé"
    "sué"
    "suée"
    "sués"
    "talé"
    "tapé"
    "taré"
    "tâté"
    "taxé"
    "tél."
    "télé"
    "ténu"
    "tête"
    "têtu"
    "thé"
    "thés"
    "tiré"
    "tôle"
    "tôt"
    "très"
    "trié"
    "tué"
    "tuée"
    "tués"
    "typé"
    "urée"
    "urgé"
    "usé"
    "usée"
    "usés"
    "vécu"
    "vélo"
    "vêt"
    "vête"
    "vêts"
    "vêtu"
    "vexé"
    "vidé"
    "viré"
    "visé"
    "volé"
    "voté"
    "voué"
    "zébu"
    "zèle"
    "zélé"
    "zéro"
    "zoné"
}
{noformat}

> org.apache.lucene.analysis.fr.FrenchAnalyzer could introduce french accent insensitive search.
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3929
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3929
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Geoffroy Schneck
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.6, 4.0-ALPHA
>
>
> The GermanAnalyzer does the same with the Umlaut for example. Searching for 'gehort' will return 'gehört' and 'gehort' .
> I expected that the FrenchAnalyzer would also return 'sécuritaires' and 'securitaires' and searching for any of them, but it's not the case



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org