You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Code Ferret (JIRA)" <ji...@apache.org> on 2018/03/12 12:53:00 UTC

[jira] [Created] (JENA-1506) Add configurable filters and tokenizers

Code Ferret created JENA-1506:
---------------------------------

             Summary: Add configurable filters and tokenizers
                 Key: JENA-1506
                 URL: https://issues.apache.org/jira/browse/JENA-1506
             Project: Apache Jena
          Issue Type: New Feature
          Components: Text
    Affects Versions: Jena 3.7.0
            Reporter: Code Ferret


In support of Jena-1488, this issue proposes to add a feature to allow including defined filters and tokenizers, similar to {{DefinedAnalyzer}}, for the {{ConfigurableAnalyzer}}, allowing configurable arguments such as the {{excludeChars}}. I've looked at {{ConfigurableAnalyzer}} and its assembler and it should be straightforward.

I would add tokenizer and filter definitions to {{TextIndexLucene}} similar to the support for adding analyzers:
{code:java}
    text:defineFilters (
        [ text:defineFilter <#foo> ; 
          text:filter [ 
            a text:GenericFilter ;
            text:class "fi.finto.FoldingFilter" ;
            text:params (
                [ text:paramName "excludeChars" ;
                  text:paramType text:TypeString ; 
                  text:paramValue "whatevercharstoexclude" ]
                )
            ] ; 
          ]
      )
{code}
{{GenericFilterAssembler}} and {{GenericTokenizerAssmbler}} would make use of much of the code in {{GenericAnalyzerAssembler}}. The changes to {{ConfigurableAnalyzer}} and {{ConfigurableAnalyzerAssembler}} are straightforward and mostly involve retaining the resource URI rather than extracting the localName.

Such an addition will make it easy to create new tokenizers and filters that could be dropped in by just adding the classes onto the jena/fuseki classpath or by referring to ones already included in Jena (via Lucene or otherwise) and putting the appropriate assembler bits in the configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)