You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Osma Suominen (JIRA)" <ji...@apache.org> on 2018/02/13 13:39:00 UTC
[jira] [Created] (JENA-1488) SelectiveFoldingFilter for jena-text
Osma Suominen created JENA-1488:
-----------------------------------
Summary: SelectiveFoldingFilter for jena-text
Key: JENA-1488
URL: https://issues.apache.org/jira/browse/JENA-1488
Project: Apache Jena
Issue Type: Improvement
Components: Text
Affects Versions: Jena 3.6.0
Reporter: Osma Suominen
Currently there's some support for accent folding in jena-text, because Lucene provides an ASCIIFoldingFilter. When this filter is enabled, a search for "deja vu" will match the literal "déjà vu" in the data.
But we can't use it here at the National Library of Finland (for Finto.fi / Skosmos), because it folds too much! In the Finnish alphabet, in addition to the Latin a-z (which are in ASCII) we use the letters åäö and these should not be folded to ASCII. So we need a Lucene analyzer that can be configured with an exclude list, something like
new SelectiveFoldingFilter(String excludeChars)
and that can be also be configured via the Jena assembler just like other analyzers supported by jena-text.
This was also briefly discussed on the skosmos-users mailing list:
[https://groups.google.com/d/msg/skosmos-users/x3zR_uRBQT0/Q90-O_iDAQAJ]
Apparently Norwegians have the same problem...
I've discussed this with [~kinow] and he has some initial code to implement this feature, so I think we can turn this into a PR fairly soon.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)