You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Selene Broers (Jira)" <ji...@apache.org> on 2020/07/23 18:45:00 UTC

[jira] [Commented] (LUCENE-9295) Removing grave accent in Dutch Snowball algorithm

    [ https://issues.apache.org/jira/browse/LUCENE-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163861#comment-17163861 ] 

Selene Broers commented on LUCENE-9295:
---------------------------------------

Linguist/software engineer/native Dutch speaker here.

Most of the words in Dutch where the accent grave is used, are indeed taken from French.
 Some examples are "scène" (scene), "caissière" (female cashier), "barrière" (barrier) or "crème" (cream).

As "ie" is another sound in Dutch (["i" in IPA|https://en.wikipedia.org/wiki/Close_front_unrounded_vowel], so like "ee" in "deep"), it's necessary to add an accent to the e if the pronunciation needs to be different. This is the case in "caissière" and "barrière".

For the syllables with just the è ("scè-ne" and "crè-me"), adding the accent is necessary to indicate the correct pronunciation ([IPA ɛ:|https://en.wikipedia.org/wiki/Open-mid_front_unrounded_vowel]). Otherwise, a Dutch reader would pronounce it like the [IPA e|https://en.wikipedia.org/wiki/Close-mid_front_unrounded_vowel]. 
 When the character 'e' is at the end of a syllable in Dutch, it's pronounced as the IPA e. 
 When the character 'e' is at the beginning or in the middle of a syllable in Dutch, it's pronounced as the IPA ɛ.
 The character 'è' is pronounced as the IPA ɛ: (which is a lengthened ɛ) , no matter its place in the syllable.

There are a few words where the accent grave is used on native Dutch words, like the exclamation "hè" (meaning "what?") and the verb "blèren" (to squall, to bawl). The accent on "hè" is necessary, because "hé" means "hey" and "he" in Dutch is a nonsense-word.
 Because the verb "blèren" is divided into syllables as "blè-ren", the 'è' is at the end of a syllable.
 The accent is there, because the e is clearly at the end of a syllable. Contrary to the normal rule, it should NOT be pronounced like IPA e here, but as IPA ɛ: .

This verb, "blèren", should keep the accent in its declensions. The pronunciation of the è in the word "blèren" is actually a lengthened ɛ (ɛ: in IPA). 
 If you remove the accent, the e in the declensions where it's in the middle of a syllable, will read like a normal ɛ, making it a nonsense-word. 
 Example:
 ik blèr (IPA blɛ:r , correct first-person singular)
 ik bler (IPA blɛr , incorrect first-person singular, nonsense-word)

The only cases I can think of where Dutch uses the accent grave on other vowels, are "à la carte" (loanword from French) and just to give an extra accent to words/emphasis on certain syllables (without changing the meaning or pronunciation).
 I suppose that is why the Snowball algorithm keeps the accent on the e, but not on the other vowels.

> Removing grave accent in Dutch Snowball algorithm
> -------------------------------------------------
>
>                 Key: LUCENE-9295
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9295
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Nguyen Minh Gia Huy
>            Priority: Minor
>              Labels: snowball
>
> I have a concern on how [Dutch Snowball algorithm|http://snowball.tartarus.org/algorithms/dutch/stemmer.html] handles the grave accent of *è.*
> It removes the grave accents on *{{à}}*, {{*ò*}}, *{{ù}}*, *{{ì}}* but doesn't with *è*. I wonder if there is something special with *è* that the stemmer wants to ignore it.
> Also, from [http://www.dutchgrammar.com/en/?n=SpellingAndPronunciation.25,] I found out that grave accent is not used commonly in Dutch anymore except in some borrowed French words.
> If *è* is not that common in Dutch, removing grave accent on it sounds reasonable to me and definitely benefits search recall in general.
> I would like to know if anyone had a strong opinion on this topic ? It would be also nice if you have some point of views as a  Dutch speaker.
> Thanks !
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org