You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by cyang2010 <ys...@hotmail.com> on 2011/03/15 02:32:11 UTC

Is WordDelimiterFilterFactory applicable to non-english language?

Does it make sense to apply WordDelimiterFilterFactory to non-english
language, such as spanish?  What about asian lanaguage?


The following are the typical use case for WordDelimiterFilterFactory.   Is
1, 2, 3, and 4 applicable to all wester language (including spanish)?   For
asian language, is 1, 2, and 4 applicable for asian lanauge, such as
Chinese?   Since 1 and 2 are based on alpha-numeric and letter-number, I am
not sure whether there is any alpha or letter in chinese character.

1. split on intra-word delimiters (all non alpha-numeric characters).
      "Wi-Fi" -> "Wi", "Fi"

2. split on case transition <-- only applicable for language with case,
right?

3. split on letter-number transition.   "SD500" -> "SD", "500"

4. leading and trailing intra-word delimiters on each subword are ignored

      "//hello---there, 'dude'" -> "hello", "there", "dude"

5. trailing "'s" are removed for each subword.   "O'Neil's" -> "O", "Neil"
    

Appreciate your help.

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-WordDelimiterFilterFactory-applicable-to-non-english-language-tp2678199p2678199.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is WordDelimiterFilterFactory applicable to non-english language?

Posted by Ahmet Arslan <io...@yahoo.com>.
> Does it make sense to apply
> WordDelimiterFilterFactory to non-english
> language, such as spanish?  

Yes it makes sense. WDF is especially good for product names; like i-phone,
iphone4 etc.