You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by xristy <gi...@git.apache.org> on 2018/03/15 19:20:44 UTC

[GitHub] jena pull request #385: defined filters and tokenizers for ConfigurableAnaly...

GitHub user xristy opened a pull request:

    https://github.com/apache/jena/pull/385

    defined filters and tokenizers for ConfigurableAnalyzer

    This PR implements [JENA-1506](https://issues.apache.org/jira/browse/JENA-1506). See the issue for details.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BuddhistDigitalResourceCenter/jena JENA-1506-PR

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #385
    
----
commit 795b9eb7cb45999c1d884bcd84c83896a498ed87
Author: Chris Tomlinson <ct...@...>
Date:   2018-03-15T19:12:53Z

    Merged JENA-1506-definedFilters

----


---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on the issue:

    https://github.com/apache/jena/pull/385
  
    Btw as you are now a committer you can follow the instructions at https://reference.apache.org/committer/github to get added to the Apache GitHub org so you'll show up as `Member` and not just as `Contributor` on here


---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/385
  
    Hey, @xristy, you want to push instead to the `master` branch at Apache:
    
    https://git-wip-us.apache.org/repos/asf/jena.git


---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on the issue:

    https://github.com/apache/jena/pull/385
  
    Looks like a really nice extension to the current machinery!
    
    I notice there is a unit test for the assembler support but no unit tests for using some of these new features, would it be possible to include some of those to verify that this actually allows addressing the use cases that led to this PR?
    
    One other thought is whether the explicit param type definitions are necessary for the simpler cases (integer, boolean) or whether we could just rely on literal datatypes to reduce the verbosity of the configuration for those simpler cases?  For the complex cases like sets, token streams, analysers etc then explicit types makes total sense.


---

[GitHub] jena pull request #385: defined filters and tokenizers for ConfigurableAnaly...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/385


---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on the issue:

    https://github.com/apache/jena/pull/385
  
    @xristy Yes I don't think there is any harm in supporting both the explicit form as well as supporting literal datatypes directly


---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/385
  
    To be clear, what is here at Github is essentially a read-only mirror. What's at Apache is the real deal. We've talked about reversing that mirroring relationship (so that we could merge PRs here at Github) but there are other consequences to so doing, mostly involving integrations with other services like Jira.


---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on the issue:

    https://github.com/apache/jena/pull/385
  
    +1 @rvesse comments. And +1 for the great PR @xristy !!! Tested with the WIP `SelectiveFoldingFilter` for JENA-1488, and it worked like a charm!!!!! :tada: :tada: :tada: 
    
    First created a normal JenaText configuration, and everything worked with no issues. Then replaced my `analyzer` by a `DefinedAnalyzer`... had a few hiccups forgetting to create a tokenizer, and had to update the filter to use a `CharArraySet` (much better that way I think).
    
    But in the end got the query working as expected. Confirmed with Luke that the contents were indexed correctly with the new filter... did a quick search in Luke, then used Fuseki, and everything worked as expected.
    
    Once this one is merged, we'll be ready for another PR for JENA-1488. In the meantime, I will start working on documentation and tests !
    
    Thanks!
    Bruno


---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by xristy <gi...@git.apache.org>.
Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/385
  
    The parameter types: `int`, `boolean` and `String` may be derived from the `text:paramValue`. See [JENA-1506](https://issues.apache.org/jira/browse/JENA-1506) for details.


---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:

    https://github.com/apache/jena/pull/385
  
    Workflow doucmentation: https://cwiki.apache.org/confluence/display/JENA/Commit+Workflow+for+Github-ASF


---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by xristy <gi...@git.apache.org>.
Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/385
  
    @rvesse  On the matter of the tests, I just updated the ticket on that matter:
    
    > There are already a suite of tests associated with TestDatasetWithConfigurableAnalyzer that exercise the new logic for identifying tokenizers and filters and instantiating them so this part of the extension is tested there.
    
    As for using literal datatypes for String, int, and boolean, I think that's doable. I tend to get slavish in following a pattern so I'm inclined to permit either the explicit parameter types as currently implemented and add the use of literal datatypes as an alternative when applicable.
    
    I added github id per your suggestion, thanks.



---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by osma <gi...@git.apache.org>.
Github user osma commented on the issue:

    https://github.com/apache/jena/pull/385
  
    Wow, quite a big patch for a "straightforward" change :) Great work @xristy !
    
    +1 to @rvesse's suggestion to simplify by making use of literal data types, if it's possible.
    
    Thanks for the tip @rvesse , I also added my GH username to the ASF profile.


---

[GitHub] jena issue #385: defined filters and tokenizers for ConfigurableAnalyzer

Posted by xristy <gi...@git.apache.org>.
Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/385
  
    I'm guessing that I'm supposed to perform the pull request but when I try to push to apache/jena master on github my github user, xristy, doesn't have permission for apache/jena.git.
    
    What is the proper procedure?


---