You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by xristy <gi...@git.apache.org> on 2018/06/13 18:16:01 UTC

[GitHub] jena pull request #436: JENA-1556 implementation

GitHub user xristy opened a pull request:

    https://github.com/apache/jena/pull/436

    JENA-1556 implementation

    implements proposed enhancements in JENA-1556 based on 3.8.0-Snapshot

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BuddhistDigitalResourceCenter/jena JENA-1556-MutilingualEnhancements-3.8.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/436.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #436
    
----
commit 0d07ca904f9495f5a320faf5b7dd761d5f7294ab
Author: Chris Tomlinson <ct...@...>
Date:   2018-06-13T18:13:15Z

    JENA-1556 implementation

----


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by kinow <gi...@git.apache.org>.

Github user kinow commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195659038
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java ---
    @@ -549,56 +567,81 @@ private String frags2string(TextFragment[] frags, HighlightOpts opts) {
             }
             return results ;
         }
    +    
    +    private Map<String, Analyzer> multilingualQueryAnalyzers = new HashMap<>();
    --- End diff --
    
    Might be better declared at the top with other vars perhaps.


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by afs <gi...@git.apache.org>.

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/436
  
    Just checking in - when this is ready, merge it in.
    
    I don't have an opinion on @kinow's comments - I don't work closely with this area of the code - but I would say that simplifying/refactoring can happen as a second PR.
    



---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195738825
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java ---
    @@ -549,56 +567,81 @@ private String frags2string(TextFragment[] frags, HighlightOpts opts) {
             }
             return results ;
         }
    +    
    +    private Map<String, Analyzer> multilingualQueryAnalyzers = new HashMap<>();
    --- End diff --
    
    moved


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by kinow <gi...@git.apache.org>.

Github user kinow commented on the issue:

    https://github.com/apache/jena/pull/436
  
    The docs look really good! Only feedback I have:
    
    * It's possible to avoid the empty commit, by amending the last commit adding "this closes #123". That's what I normally do when I work in other projects.
    * Should we also merge + --no-ff, or is fetch/rebase all right too? I've done it a few times, but happy to do merge + --no-ff instead if desirable 
    * Perhaps a link in the website for it, under Contributing? Or a new page? It's probably related to http://jena.apache.org/getting_involved/reviewing_contributions.html too? (or maybe there's already a link somewhere?)
    
    Thanks


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/436
  
    Thanks everyone. I seemed to have finally merged PR#436. I ended up using @rvesse plus a bit from the commit workflow (which I agree should be linked on the Contributing page):
    
        > git checkout master
        > git pull apache master
        > git merge Jena-1556-MutilingualEnhancements-3.8.0
        > mvn clean install -Pdev
        > git push apache master
    
    I put the proper defns in the jena/.git/config for the remotes


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195743299
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/DefineAnalyzersAssembler.java ---
    @@ -39,7 +44,46 @@
                  text:analyzer [ . . . ]]
             )
         */
    +    private static Logger          log      = LoggerFactory.getLogger(DefineAnalyzersAssembler.class) ;
    --- End diff --
    
    fixed. sloppy copy/paste. thanks


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195426932
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/DefineAnalyzersAssembler.java ---
    @@ -84,6 +121,37 @@ public static boolean open(Assembler a, Resource list) {
                             throw new TextIndexException("addAnalyzers text:defineAnalyzer property must be a non-blank resource: " + adding);
                         }
                     }
    +                
    +                String langCode = null;
    +                
    +                if (adding.hasProperty(TextVocab.pAddLang)) {
    +                    Statement langStmt = adding.getProperty(TextVocab.pAddLang);
    +                    langCode = langStmt.getString();
    +                    Util.addAnalyzer(langCode, analyzer);
    +                    isMultilingualSupport = true;
    +                }
    +                
    +                if (langCode != null && adding.hasProperty(TextVocab.pSearchFor)) {
    +                    Statement searchForStmt = adding.getProperty(TextVocab.pSearchFor);
    +                    List<String> tags = getStringList(searchForStmt, "text:searchFor");
    +                    Util.addSearchForTags(langCode, tags);
    +                }
    +                
    +                if (langCode != null && adding.hasProperty(TextVocab.pAuxIndex)) {
    +                    Statement searchForStmt = adding.getProperty(TextVocab.pAuxIndex);
    +                    List<String> tags = getStringList(searchForStmt, "text:auxIndex");
    +                    Util.addAuxIndexes(langCode, tags);
    +                    log.trace("addAuxIndexes for {} with tags: {}", langCode, tags);
    +                }
    +                
    +                
    --- End diff --
    
    deleted. thanks for your eagle eye.


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by kinow <gi...@git.apache.org>.

Github user kinow commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195659628
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/analyzer/QueryMultilingualAnalyzer.java ---
    @@ -0,0 +1,75 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.query.text.analyzer ;
    +
    +import org.apache.lucene.analysis.Analyzer ;
    +import org.apache.lucene.analysis.DelegatingAnalyzerWrapper;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/** 
    + * Lucene Analyzer implementation that delegates to a language-specific
    + * Analyzer based on a field name suffix: e.g. field="label_en" will use
    + * an EnglishAnalyzer.
    + */
    +
    +public class QueryMultilingualAnalyzer extends DelegatingAnalyzerWrapper {
    +        private static Logger log = LoggerFactory.getLogger(QueryMultilingualAnalyzer.class);
    +        private Analyzer defaultAnalyzer;
    +        private String langTag;
    +
    +        public QueryMultilingualAnalyzer(Analyzer defaultAnalyzer) {
    +                super(PER_FIELD_REUSE_STRATEGY);
    +                this.defaultAnalyzer = defaultAnalyzer;
    +                this.langTag = null;
    +        }
    +
    +        public QueryMultilingualAnalyzer(Analyzer defaultAnalyzer, String tag) {
    +                super(PER_FIELD_REUSE_STRATEGY);
    +                this.defaultAnalyzer = defaultAnalyzer;
    +                this.langTag = tag;
    +        }
    +
    +        @Override
    +        /**
    +         * The analyzer corresponding to the langTag supplied at instantiation
    +         * is used to retrieve the analyzer to use regardless of the tag on the
    +         * fieldName. If no langTag is supplied then the tag on fieldName is
    +         * used to retrieve the analyzer as with the MultilingualAnalyzer
    +         * 
    +         * @param fieldName
    +         * @return the analyzer to use in the search
    +         */
    +        protected Analyzer getWrappedAnalyzer(String fieldName) {
    +                int idx = fieldName.lastIndexOf("_");
    --- End diff --
    
    Weird formatting in this file, but no blocker.


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by afs <gi...@git.apache.org>.

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/436
  
    Committer workflow:
    https://cwiki.apache.org/confluence/display/JENA/Commit+Workflow+for+Github-ASF


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by kinow <gi...@git.apache.org>.

Github user kinow commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195660039
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/DefineAnalyzersAssembler.java ---
    @@ -39,7 +44,46 @@
                  text:analyzer [ . . . ]]
             )
         */
    +    private static Logger          log      = LoggerFactory.getLogger(DefineAnalyzersAssembler.class) ;
    --- End diff --
    
    Weird spacing here?


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/436
  
    @afs I will work on unit tests over the next couple of days. I wanted to get the main code in for review. It would be great to get into 3.8.0 pending @kinow and others reviews.


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by afs <gi...@git.apache.org>.

Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195399217
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java ---
    @@ -416,7 +433,8 @@ private Query parseQuery(String queryString, Analyzer analyzer) throws ParseExce
                 throw new TextIndexParseException(qs, ex.getMessage()) ;
             }
             catch (Exception ex) {
    -            throw new TextIndexException(ex) ;
    +            ex.printStackTrace(); // TEMPORARY 
    --- End diff --
    
    Remove this or turn into log debug.


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195425502
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/TextVocab.java ---
    @@ -108,6 +108,10 @@
         public static final Property pDefTokenizer      = Vocab.property(NS, "defineTokenizer");
         public static final Property pAddLang           = Vocab.property(NS, "addLang");
         public static final Property pUseAnalyzer       = Vocab.property(NS, "useAnalyzer");
    +    public static final Property pSearchFor         = Vocab.property(NS, "searchFor");
    +    public static final Property pAuxIndex          = Vocab.property(NS, "auxIndex");
    +    public static final Property pIndexAnalyzer     = Vocab.property(NS, "indexAnalyzer");
    +    public static final Property indexAnalyzer      = Vocab.property(NS, "IndexAnalyzer");
    --- End diff --
    
    deleted. thanks for noticing


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by kinow <gi...@git.apache.org>.

Github user kinow commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195659829
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/analyzer/QueryMultilingualAnalyzer.java ---
    @@ -0,0 +1,75 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.query.text.analyzer ;
    +
    +import org.apache.lucene.analysis.Analyzer ;
    +import org.apache.lucene.analysis.DelegatingAnalyzerWrapper;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/** 
    + * Lucene Analyzer implementation that delegates to a language-specific
    + * Analyzer based on a field name suffix: e.g. field="label_en" will use
    + * an EnglishAnalyzer.
    + */
    +
    +public class QueryMultilingualAnalyzer extends DelegatingAnalyzerWrapper {
    +        private static Logger log = LoggerFactory.getLogger(QueryMultilingualAnalyzer.class);
    +        private Analyzer defaultAnalyzer;
    +        private String langTag;
    +
    +        public QueryMultilingualAnalyzer(Analyzer defaultAnalyzer) {
    +                super(PER_FIELD_REUSE_STRATEGY);
    +                this.defaultAnalyzer = defaultAnalyzer;
    +                this.langTag = null;
    +        }
    +
    +        public QueryMultilingualAnalyzer(Analyzer defaultAnalyzer, String tag) {
    +                super(PER_FIELD_REUSE_STRATEGY);
    +                this.defaultAnalyzer = defaultAnalyzer;
    +                this.langTag = tag;
    +        }
    +
    +        @Override
    +        /**
    +         * The analyzer corresponding to the langTag supplied at instantiation
    +         * is used to retrieve the analyzer to use regardless of the tag on the
    +         * fieldName. If no langTag is supplied then the tag on fieldName is
    +         * used to retrieve the analyzer as with the MultilingualAnalyzer
    +         * 
    +         * @param fieldName
    +         * @return the analyzer to use in the search
    +         */
    +        protected Analyzer getWrappedAnalyzer(String fieldName) {
    +                int idx = fieldName.lastIndexOf("_");
    +                if (idx == -1) { // not language-specific, e.g. "label"
    +                        return defaultAnalyzer;
    +                }
    +                String lang = langTag != null ? langTag : fieldName.substring(idx+1);
    +                Analyzer analyzer = Util.getLocalizedAnalyzer(lang);
    +                analyzer = analyzer != null ? analyzer : defaultAnalyzer;
    --- End diff --
    
    Maybe simplify statements like these with
    
    ```java
    analyzer = ObjectUtils.defaultIfNull(analyzer, defaultAnalyzer);
    ```


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by afs <gi...@git.apache.org>.

Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195400337
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/analyzer/QueryMultilingualAnalyzer.java ---
    @@ -0,0 +1,76 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.query.text.analyzer ;
    +
    +import org.apache.lucene.analysis.Analyzer ;
    +import org.apache.lucene.analysis.DelegatingAnalyzerWrapper;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +
    --- End diff --
    
    Minor: two blank lines. There are several of these. 
    
    (This is not a request to change - it's the people who work most on the text subsystem to agree.)


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by afs <gi...@git.apache.org>.

Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195400731
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/DefineAnalyzersAssembler.java ---
    @@ -84,6 +121,37 @@ public static boolean open(Assembler a, Resource list) {
                             throw new TextIndexException("addAnalyzers text:defineAnalyzer property must be a non-blank resource: " + adding);
                         }
                     }
    +                
    +                String langCode = null;
    +                
    +                if (adding.hasProperty(TextVocab.pAddLang)) {
    +                    Statement langStmt = adding.getProperty(TextVocab.pAddLang);
    +                    langCode = langStmt.getString();
    +                    Util.addAnalyzer(langCode, analyzer);
    +                    isMultilingualSupport = true;
    +                }
    +                
    +                if (langCode != null && adding.hasProperty(TextVocab.pSearchFor)) {
    +                    Statement searchForStmt = adding.getProperty(TextVocab.pSearchFor);
    +                    List<String> tags = getStringList(searchForStmt, "text:searchFor");
    +                    Util.addSearchForTags(langCode, tags);
    +                }
    +                
    +                if (langCode != null && adding.hasProperty(TextVocab.pAuxIndex)) {
    +                    Statement searchForStmt = adding.getProperty(TextVocab.pAuxIndex);
    +                    List<String> tags = getStringList(searchForStmt, "text:auxIndex");
    +                    Util.addAuxIndexes(langCode, tags);
    +                    log.trace("addAuxIndexes for {} with tags: {}", langCode, tags);
    +                }
    +                
    +                
    --- End diff --
    
    Minor:Two blank lines.
    
    It is one-style higher up at L139.


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by kinow <gi...@git.apache.org>.

Github user kinow commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195658758
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java ---
    @@ -316,6 +326,13 @@ protected Document doc(Entity entity) {
                         if (this.isMultilingual) {
                             // add a field that uses a language-specific analyzer via MultilingualAnalyzer
                             doc.add(new Field(e.getKey() + "_" + lang, (String) e.getValue(), ftText));
    +                        // add fields for any defined auxiliary indexes
    +                        List<String> auxIndexes = Util.getAuxIndexes(lang);
    +                        if (auxIndexes != null) {
    --- End diff --
    
    Never null I believe. We return an empty list when `lang` is empty. And use a `Hashtable` to keep data. But happy to leave it if you prefer to double-check it anyway (wonder if we should consider `@Nullable` et `@NotNull` in method signatures some day)


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195425341
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java ---
    @@ -416,7 +433,8 @@ private Query parseQuery(String queryString, Analyzer analyzer) throws ParseExce
                 throw new TextIndexParseException(qs, ex.getMessage()) ;
             }
             catch (Exception ex) {
    -            throw new TextIndexException(ex) ;
    +            ex.printStackTrace(); // TEMPORARY 
    --- End diff --
    
    deleted


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/436
  
    I haven't updated the documentation yet - which I plan to do over the next couple of days. I'm happy to mark [JENA-1556](https://issues.apache.org/jira/browse/JENA-1556) resolved now.


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by afs <gi...@git.apache.org>.

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/436
  
    I have a copy of Jena cloned from git-wip-us.apache.org = a added second remote for github (this is so it is called as "github", not the full URL "https://github.com/apache/jena.git").
    
    To merge, I have been using that one, merge from github (--no-ff) into local master and push to origin/master.
    
    Is there anything we can do to improve https://cwiki.apache.org/confluence/display/JENA/Commit+Workflow+for+Github-ASF?


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

https://github.com/apache/jena/pull/436

@kinow I think the configuration and results reflect the `text:searchFor` functionality; however, in the analyzer defn for the tag `jp`:

text:analyzer [
a text:GenericAnalyzer ;
text:class "org.apache.lucene.analysis.ja.JapaneseAnalyzer" ;
text:tokenizer <#tokenizer> ;
]

the `text:tokenizer <#tokenizer> ;` is not effective. Tokenizer specs work with `ConfigurableAnalyzer` and are ignored in `text:GenericAnalyzer`. Perhaps a warning should be logged but that means checking for the presence of unsupported predicates?

Re:
> the complexity put on TextIndexLucene. A few methods are getting a boolean flag to change their behaviour. And when that happens too much, sometimes it may feel like the method has two behaviours, and writing tests or changing it may be challenging. Maybe it could extend it in some other way.

I'm not sure how to improve this. The flag in `highlightResults` affects the value of the `effectiveField` in the context of a larger method, and the flag in `getQueryAnalyzer` conditions whether any useful work is done or not. I factored that as a method rather than leaving it inline in `query$` to reduce the clutter in that principal routine.

Re:
> it's not a batteries-included feature, if I understand correctly. You still need to prepare the other part of the solution, be it a tokenizer that gets a value such as "kinou", then searches some dictionary, and finally create tokens for :ex3 dc:title "昨日" and "きのう", or change the data a bit. Maybe this could be a separate project, or an extension of sorts.

I'm not sure what you are recommending here. The `text:searchFor` and `text:auxIndex` functionalities are ways of configuring the _application_ of appropriate analyzers that have been separately defined. So yes the features are not self-contained in that analyzers do have to be supplied.

---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195736987
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java ---
    @@ -316,6 +326,13 @@ protected Document doc(Entity entity) {
                         if (this.isMultilingual) {
                             // add a field that uses a language-specific analyzer via MultilingualAnalyzer
                             doc.add(new Field(e.getKey() + "_" + lang, (String) e.getValue(), ftText));
    +                        // add fields for any defined auxiliary indexes
    +                        List<String> auxIndexes = Util.getAuxIndexes(lang);
    +                        if (auxIndexes != null) {
    --- End diff --
    
    ``auxIndexes.get(tag)`` will return null if ``tag`` has no auxiliary indexes defined - which is not an unusual case. 
    
    As I look at ``Util.getAuxIndexes`` I unnecessarily return an empty list that then does no work in the guarded ``for`` loop. I'm changing to ``null`` in the case of a ``null`` or empty ``tag``.


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by kinow <gi...@git.apache.org>.

Github user kinow commented on the issue:

    https://github.com/apache/jena/pull/436
  
    This weekend I'm working on Commons, but planning on trying this pull request if time allows, so hopefully providing feedback if that worked with the Japanese example discussed in JIRA soon.


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by afs <gi...@git.apache.org>.

Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195399760
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/assembler/TextVocab.java ---
    @@ -108,6 +108,10 @@
         public static final Property pDefTokenizer      = Vocab.property(NS, "defineTokenizer");
         public static final Property pAddLang           = Vocab.property(NS, "addLang");
         public static final Property pUseAnalyzer       = Vocab.property(NS, "useAnalyzer");
    +    public static final Property pSearchFor         = Vocab.property(NS, "searchFor");
    +    public static final Property pAuxIndex          = Vocab.property(NS, "auxIndex");
    +    public static final Property pIndexAnalyzer     = Vocab.property(NS, "indexAnalyzer");
    +    public static final Property indexAnalyzer      = Vocab.property(NS, "IndexAnalyzer");
    --- End diff --
    
    Is this meant to be a class? If so, Resource, not Property.  If not, pIndexAnalyzer covers it.


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by afs <gi...@git.apache.org>.

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/436
  
    Much cleaner!
    
    I don't see tests.
    
    Do you want to get this into the 3.8.0 release? I'm happy to hold off the release for a short while so this PR can be discussed and merged. (I'd like to get 3.8.0 done this month but we don't have a hard deadline.)
    
    @xristy What's your guidance here?


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/436
  
    @kinow Thank you very much for the pointers. I think I'm closer. 
    
    I added a remote apache-origin (should have called it jena-origin I suppose).
    
    Then did:
    
        git push apache-origin
    
    and got encouraging result:
    
        Counting objects: 3316, done.
        Delta compression using up to 8 threads.
        Compressing objects: 100% (1017/1017), done.
        Writing objects: 100% (3316/3316), 795.73 KiB | 99.47 MiB/s, done.
        Total 3316 (delta 1505), reused 3218 (delta 1453)
        remote: jena git commit: added auxIndex unit test
        remote: jena git commit: added searchFor unit tests
        remote: jena git commit: various cleanup per @kinow
        remote: jena git commit: cleanup per comments from afs
        remote: jena git commit: JENA-1556 implementation
        To https://git-wip-us.apache.org/repos/asf/jena.git
         * [new branch]            JENA-1556-MutilingualEnhancements-3.8.0 -> JENA-1556-MutilingualEnhancements-3.8.0
    
    unfortunately I don't know how to merge the new branch into `master`. This is my first time working directly with the Apache git repo.


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195426563
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/analyzer/QueryMultilingualAnalyzer.java ---
    @@ -0,0 +1,76 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.query.text.analyzer ;
    +
    +import org.apache.lucene.analysis.Analyzer ;
    +import org.apache.lucene.analysis.DelegatingAnalyzerWrapper;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +
    --- End diff --
    
    deleted the extra line from classes in ``org.apache.jena.query.text.analyzer`` to bring into line with rest of jena-text


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195742866
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/analyzer/QueryMultilingualAnalyzer.java ---
    @@ -0,0 +1,75 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.query.text.analyzer ;
    +
    +import org.apache.lucene.analysis.Analyzer ;
    +import org.apache.lucene.analysis.DelegatingAnalyzerWrapper;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/** 
    + * Lucene Analyzer implementation that delegates to a language-specific
    + * Analyzer based on a field name suffix: e.g. field="label_en" will use
    + * an EnglishAnalyzer.
    + */
    +
    +public class QueryMultilingualAnalyzer extends DelegatingAnalyzerWrapper {
    +        private static Logger log = LoggerFactory.getLogger(QueryMultilingualAnalyzer.class);
    +        private Analyzer defaultAnalyzer;
    +        private String langTag;
    +
    +        public QueryMultilingualAnalyzer(Analyzer defaultAnalyzer) {
    +                super(PER_FIELD_REUSE_STRATEGY);
    +                this.defaultAnalyzer = defaultAnalyzer;
    +                this.langTag = null;
    +        }
    +
    +        public QueryMultilingualAnalyzer(Analyzer defaultAnalyzer, String tag) {
    +                super(PER_FIELD_REUSE_STRATEGY);
    +                this.defaultAnalyzer = defaultAnalyzer;
    +                this.langTag = tag;
    +        }
    +
    +        @Override
    +        /**
    +         * The analyzer corresponding to the langTag supplied at instantiation
    +         * is used to retrieve the analyzer to use regardless of the tag on the
    +         * fieldName. If no langTag is supplied then the tag on fieldName is
    +         * used to retrieve the analyzer as with the MultilingualAnalyzer
    +         * 
    +         * @param fieldName
    +         * @return the analyzer to use in the search
    +         */
    +        protected Analyzer getWrappedAnalyzer(String fieldName) {
    +                int idx = fieldName.lastIndexOf("_");
    --- End diff --
    
    the format is the result of copy/paste of ``MultilingualAnalyzer``. I'm _correcting_ the formatting of ``IndexMultilingualAnalyzer`` and ``QueryMultilingualAnalyzer`` to match the majority of other classes in jena-text.
    
    I'm not sure when I see these sorts of differences whether to correct in the other files such as ``ConfigurableAnalyzer``, ``LowerCaseKeywordAnalyzer`` and ``MultilingualAnalyzer``. Correcting creates larger noise diffs. I'm fine with normalizing formatting but is there a preferred protocol?


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by afs <gi...@git.apache.org>.

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/436
  
    Staging now has links on the "Contributing" page to the workflow and to the release process.
    
    One last thing : can [Jena-1556](https://issues.apache.org/jira/browse/JENA-1556) be resolved or do you want to keep it open for something?  If it is marked "resolved", the during release it will be closed. This helps tracking all the changes in a release.



---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/436
  
    @afs I don't have write access to this github mirror repo and don't immediately see how work in the `git-wip-us.apache.org` so I'm not sure how to proceed with merging


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/436


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by rvesse <gi...@git.apache.org>.

Github user rvesse commented on the issue:

    https://github.com/apache/jena/pull/436
  
    @xristy I think you can just merge as you would normally though you probably need to be explicit about the remote and branch you are using for remote operations:
    
    ```
    > git checkout master
    > git pull apache-origin master
    > git merge Jena-1556-MutilingualEnhancements-3.8.0
    > git push apache-origin master
    ```


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by kinow <gi...@git.apache.org>.

Github user kinow commented on the issue:

    https://github.com/apache/jena/pull/436
  
    @xristy if you'd like to merge it, and you cannot use the GitHub interface, then you can
    
    * Add a remote in your git repository using the developerConnection URL from the pom.xml (https://github.com/apache/jena/blob/master/pom.xml) but without the scm:git part (i.e. https://git-wip-us.apache.org/repos/asf/jena.git)
    * Merge the work and push to master
    * If the commit tree is OK, GitHub should understand this branch was merged. But if you changed or didn't rebase the code, you can still close it manually, or amend a commit and add somewhere in the message "This closes pr#436"
    
    HTH


---

[GitHub] jena pull request #436: JENA-1556 implementation

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on a diff in the pull request:

    https://github.com/apache/jena/pull/436#discussion_r195745831
  
    --- Diff: jena-text/src/main/java/org/apache/jena/query/text/analyzer/QueryMultilingualAnalyzer.java ---
    @@ -0,0 +1,75 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.query.text.analyzer ;
    +
    +import org.apache.lucene.analysis.Analyzer ;
    +import org.apache.lucene.analysis.DelegatingAnalyzerWrapper;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/** 
    + * Lucene Analyzer implementation that delegates to a language-specific
    + * Analyzer based on a field name suffix: e.g. field="label_en" will use
    + * an EnglishAnalyzer.
    + */
    +
    +public class QueryMultilingualAnalyzer extends DelegatingAnalyzerWrapper {
    +        private static Logger log = LoggerFactory.getLogger(QueryMultilingualAnalyzer.class);
    +        private Analyzer defaultAnalyzer;
    +        private String langTag;
    +
    +        public QueryMultilingualAnalyzer(Analyzer defaultAnalyzer) {
    +                super(PER_FIELD_REUSE_STRATEGY);
    +                this.defaultAnalyzer = defaultAnalyzer;
    +                this.langTag = null;
    +        }
    +
    +        public QueryMultilingualAnalyzer(Analyzer defaultAnalyzer, String tag) {
    +                super(PER_FIELD_REUSE_STRATEGY);
    +                this.defaultAnalyzer = defaultAnalyzer;
    +                this.langTag = tag;
    +        }
    +
    +        @Override
    +        /**
    +         * The analyzer corresponding to the langTag supplied at instantiation
    +         * is used to retrieve the analyzer to use regardless of the tag on the
    +         * fieldName. If no langTag is supplied then the tag on fieldName is
    +         * used to retrieve the analyzer as with the MultilingualAnalyzer
    +         * 
    +         * @param fieldName
    +         * @return the analyzer to use in the search
    +         */
    +        protected Analyzer getWrappedAnalyzer(String fieldName) {
    +                int idx = fieldName.lastIndexOf("_");
    +                if (idx == -1) { // not language-specific, e.g. "label"
    +                        return defaultAnalyzer;
    +                }
    +                String lang = langTag != null ? langTag : fieldName.substring(idx+1);
    +                Analyzer analyzer = Util.getLocalizedAnalyzer(lang);
    +                analyzer = analyzer != null ? analyzer : defaultAnalyzer;
    --- End diff --
    
    done. I need to become more familiar with ``StringUtils`` and ``ObjectUtils``


---

[GitHub] jena issue #436: JENA-1556 implementation

Posted by afs <gi...@git.apache.org>.

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/436
  
    Thanks - it's only being near to a release that makes it of significance.


---