You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by tballison <gi...@git.apache.org> on 2018/02/26 16:29:35 UTC

[GitHub] lucene-solr pull request #328: SOLR-12034

GitHub user tballison opened a pull request:

    https://github.com/apache/lucene-solr/pull/328

    SOLR-12034

    First draft of SOLR-12034 -- not ready for committing. Some non-flaky tests are now failing.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tballison/lucene-solr jira/SOLR-12034

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/328.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #328
    
----
commit a118f5f9a0e3206d87e62394924d18bbf3b94fd3
Author: tballison <ta...@...>
Date:   2018-02-26T16:27:47Z

    SOLR-12034 -- first pass

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #328: SOLR-12034

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
  
    Custom Analyzer was just added to have an easy-to-use Builder-Like API for Analyzers. It was not meant to replace SOlr's (although it would be nice, but it's impossible as I figured out at that time, too). Solr is based on modifiable classes and XML, not builders...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #328: SOLR-12034

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
  
    Wow...long time since I've visited this code.  Now I think I recall...the ugliness that I don't like imposing on the CustomAnalyzer's API is that it holds its own ResourceLoader and applies it when the user calls, e.g. `withTokenizer(class/classname, params)`, `add(Token|Char)Filter(class/classname, params)`.  
    
    In Solr, the charfilter, tokenizer, tokenfilter factories are fully built with resources loaded by `FieldTypePluginLoader`'s  `loader` a (`SolrResourceLoader`) in `readAnalyzer(Node node)` one by one...I think (???), and _then_ they are added to the `CustomAnalyzer`.
    
    I also see in `ManagedIndexSchema`, that there's `postReadInform()` which calls `informResourceLoaderAwareObjectsInChain`, which then loads the resources.
    
    So, when I break the API in CustomAnalyzer and make public, e.g. `addTokenFilter(TokenFilterFactory factory)`, there's an unused private variable `ResourceLoader loader`, which feels ugly...a user could both specify a resource loader in `Builder`'s initializer and then pass in fully loaded components that would bypass that resource loader.  This smells bad to me...
    
    Any recommendations?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #328: SOLR-12034

Posted by romseygeek <gi...@git.apache.org>.
Github user romseygeek commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
  
    The lucene-level changes are there so that you can take an existing CustomAnalyzer and tweak it, if I'm reading correctly.  Can you instead add a constructor to CustomAnalyzer.Builder that takes a CustomAnalyzer?  I think that would simplify things.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #328: SOLR-12034

Posted by tballison <gi...@git.apache.org>.
Github user tballison closed the pull request at:

    https://github.com/apache/lucene-solr/pull/328


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #328: SOLR-12034

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
  
    @romseygeek , y, happy to fix/update this.  I'll take a look later today.
    
    Part of the reason I gave up on this is that I didn't like the changes I had to make at the Lucene level.  It felt like I was screwing up the elegant Lucene-level API.  Any recommendations?
    
    Also, @dsmiley recommended I move the Lucene-level modifications to another issue.  Are you ok if these go in as one, or should I open up a separate PR for the Lucene-level mods?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #328: SOLR-12034

Posted by romseygeek <gi...@git.apache.org>.
Github user romseygeek commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
  
    I think we can proceed with LUCENE-8497 anyway by adding a new method to TokenizerChain that only uses multi-term aware filters.  Feel free to close this one @tballison .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [GitHub] lucene-solr issue #328: SOLR-12034

Posted by Alan Woodward <ro...@gmail.com>.
See LUCENE-8497 for more details.  Mayya would like to replace the marker interface with type-safe methods on CharFilterFactory and TokenFilterFactory

> On 1 Oct 2018, at 16:15, Erick Erickson <er...@gmail.com> wrote:
> 
> Just skimming here and on my way to the airport. The original purpose
> of MultiTermAwareComponent was to be able to, for instance, search
> wildcards without having to, say, lowercase the term on the client side
> (remember all the questions like "Why does a search for 'powerful' succeed
> but not 'Power*' ?"). I'm assuming this change won't alter that behavior....
> 
> 
> On Mon, Oct 1, 2018 at 1:06 AM romseygeek <gi...@git.apache.org> wrote:
>> 
>> Github user romseygeek commented on the issue:
>> 
>>   https://github.com/apache/lucene-solr/pull/328
>> 
>>   Hi @tballison , are you still interested in pushing this one?  I can help out, as I'd like to get this committed so that we can move forward with LUCENE-8497 and remove the `MultiTermAwareComponent` interface.
>> 
>> 
>> ---
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [GitHub] lucene-solr issue #328: SOLR-12034

Posted by Erick Erickson <er...@gmail.com>.
Just skimming here and on my way to the airport. The original purpose
of MultiTermAwareComponent was to be able to, for instance, search
wildcards without having to, say, lowercase the term on the client side
(remember all the questions like "Why does a search for 'powerful' succeed
but not 'Power*' ?"). I'm assuming this change won't alter that behavior....


On Mon, Oct 1, 2018 at 1:06 AM romseygeek <gi...@git.apache.org> wrote:
>
> Github user romseygeek commented on the issue:
>
>     https://github.com/apache/lucene-solr/pull/328
>
>     Hi @tballison , are you still interested in pushing this one?  I can help out, as I'd like to get this committed so that we can move forward with LUCENE-8497 and remove the `MultiTermAwareComponent` interface.
>
>
> ---
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #328: SOLR-12034

Posted by romseygeek <gi...@git.apache.org>.
Github user romseygeek commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
  
    Hi @tballison , are you still interested in pushing this one?  I can help out, as I'd like to get this committed so that we can move forward with LUCENE-8497 and remove the `MultiTermAwareComponent` interface.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #328: SOLR-12034

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
  
    Thank you, @romseygeek , for thinking of this PR.  I'm closing it because I don't want to wreck the API of CustomAnalyzer.Builder().


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #328: SOLR-12034

Posted by tballison <gi...@git.apache.org>.
Github user tballison commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
  
    I'll leave this open a bit for discussion in case someone can think of a solution.  My intuition is in accord with @uschindler 's judgment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #328: SOLR-12034

Posted by uschindler <gi...@git.apache.org>.
Github user uschindler commented on the issue:

    https://github.com/apache/lucene-solr/pull/328
  
    If you can't make Solr work with the current public CustomAnalyzer API, please keep it as is and use your own TokenizerChain in Solr. Please don't make CustomAnalyzer unmodifiable or add access to internal fields! In fact, it's a minimum amount of code behind the 3 lists of factories that build up the Analyzer that does not justify cluttering Lucene's API (like the horrible MultiTermAwareComponent added by Solr).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org