You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by diegoceccarelli <gi...@git.apache.org> on 2018/10/22 23:09:13 UTC

[GitHub] lucene-solr pull request #482: LUCENE-8539: fix some typos and improve style

GitHub user diegoceccarelli opened a pull request:

    https://github.com/apache/lucene-solr/pull/482

    LUCENE-8539: fix some typos and improve style

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/diegoceccarelli/lucene-solr LUCENE-8539

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/482.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #482
    
----
commit b007debb2ea995c6878c0d74385bf31951710b5a
Author: Diego Ceccarelli <dc...@...>
Date:   2018-10-22T23:08:51Z

    LUCENE-8539: fix some typos and improve style

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #482: LUCENE-8539: fix some typos and improve style...

Posted by alessandrobenedetti <gi...@git.apache.org>.

Github user alessandrobenedetti commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/482#discussion_r227469439
  
    --- Diff: lucene/core/src/test/org/apache/lucene/analysis/TestStopFilter.java ---
    @@ -47,58 +48,65 @@ public void testStopFilt() throws IOException {
         assertTokenStreamContents(stream, new String[] { "Now", "The" });
       }
     
    +
    +  private void logStopwords(String name, List<String> stopwords){
    +    // helper method: converts a list
    +    log(String.format("stopword list \"%s:\"", name));
    +    for (int i = 0; i < stopwords.size(); i++) {
    +      log(String.format("stopword (%d): %s ", i, stopwords.get(i)));
    +    }
    +    log("----------");
    +  }
       /**
        * Test Position increments applied by StopFilter with and without enabling this option.
        */
    -  public void testStopPositons() throws IOException {
    +  public void testStopPositions() throws IOException {
    +    final int NUMBER_OF_TOKENS = 20;
         StringBuilder sb = new StringBuilder();
    -    ArrayList<String> a = new ArrayList<>();
    -    for (int i=0; i<20; i++) {
    -      String w = English.intToEnglish(i).trim();
    -      sb.append(w).append(" ");
    -      if (i%3 != 0) a.add(w);
    +    List<String> stopwords = new ArrayList<>(NUMBER_OF_TOKENS);
    +    for (int i = 0; i < NUMBER_OF_TOKENS; i++) {
    +      String token = English.intToEnglish(i).trim();
    +      sb.append(token).append(' ');
    +      if (i%3 != 0) stopwords.add(token);
         }
         log(sb.toString());
    -    String stopWords[] = a.toArray(new String[0]);
    -    for (int i=0; i<a.size(); i++) log("Stop: "+stopWords[i]);
    -    CharArraySet stopSet = StopFilter.makeStopSet(stopWords);
    +    CharArraySet stopSet = StopFilter.makeStopSet(stopwords);
    +    logStopwords("All stopwords", stopwords);
         // with increments
         StringReader reader = new StringReader(sb.toString());
         final MockTokenizer in = new MockTokenizer(MockTokenizer.WHITESPACE, false);
         in.setReader(reader);
    -    StopFilter stpf = new StopFilter(in, stopSet);
    -    doTestStopPositons(stpf);
    +    StopFilter stopfilter = new StopFilter(in, stopSet);
    +    doTestStopwordsPositions(stopfilter);
         // with increments, concatenating two stop filters
    -    ArrayList<String> a0 = new ArrayList<>();
    -    ArrayList<String> a1 = new ArrayList<>();
    -    for (int i=0; i<a.size(); i++) {
    -      if (i%2==0) { 
    -        a0.add(a.get(i));
    +    List<String> evenStopwords = new ArrayList<>(stopwords.size());
    +    List<String> oddStopwords = new ArrayList<>(stopwords.size());
    +    for (int i=0; i < stopwords.size(); i++) {
    +      if (i%2 == 0) {
    +        evenStopwords.add(stopwords.get(i));
           } else {
    -        a1.add(a.get(i));
    +        oddStopwords.add(stopwords.get(i));
           }
         }
    -    String stopWords0[] =  a0.toArray(new String[0]);
    -    for (int i=0; i<a0.size(); i++) log("Stop0: "+stopWords0[i]);
    -    String stopWords1[] =  a1.toArray(new String[0]);
    -    for (int i=0; i<a1.size(); i++) log("Stop1: "+stopWords1[i]);
    -    CharArraySet stopSet0 = StopFilter.makeStopSet(stopWords0);
    -    CharArraySet stopSet1 = StopFilter.makeStopSet(stopWords1);
    +    CharArraySet evenStopSet = StopFilter.makeStopSet(evenStopwords);
    +    logStopwords("Even stopwords", evenStopwords);
    +    CharArraySet oddStopSet = StopFilter.makeStopSet(oddStopwords);
    +    logStopwords("Odd stopwords", oddStopwords);
         reader = new StringReader(sb.toString());
         final MockTokenizer in1 = new MockTokenizer(MockTokenizer.WHITESPACE, false);
         in1.setReader(reader);
    -    StopFilter stpf0 = new StopFilter(in1, stopSet0); // first part of the set
    -    StopFilter stpf01 = new StopFilter(stpf0, stopSet1); // two stop filters concatenated!
    -    doTestStopPositons(stpf01);
    +    StopFilter evenStopFilter = new StopFilter(in1, evenStopSet); // first part of the set
    +    StopFilter oddStopFilter = new StopFilter(evenStopFilter, oddStopSet); // two stop filters concatenated!
    --- End diff --
    
    maybe renaming this to concatenatedStopFilter because it cointains the entire stop filter set through 2 stop filters and not just the odd ones ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #482: LUCENE-8539: fix some typos and improve style...

Posted by alessandrobenedetti <gi...@git.apache.org>.

Github user alessandrobenedetti commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/482#discussion_r227464824
  
    --- Diff: lucene/core/src/test/org/apache/lucene/analysis/TestStopFilter.java ---
    @@ -111,20 +119,24 @@ public void testEndStopword() throws Exception {
                                   null);
       }
     
    -  private void doTestStopPositons(StopFilter stpf) throws IOException {
    -    CharTermAttribute termAtt = stpf.getAttribute(CharTermAttribute.class);
    -    PositionIncrementAttribute posIncrAtt = stpf.getAttribute(PositionIncrementAttribute.class);
    -    stpf.reset();
    -    for (int i=0; i<20; i+=3) {
    -      assertTrue(stpf.incrementToken());
    -      log("Token "+i+": "+stpf);
    -      String w = English.intToEnglish(i).trim();
    -      assertEquals("expecting token "+i+" to be "+w,w,termAtt.toString());
    -      assertEquals("all but first token must have position increment of 3",i==0?1:3,posIncrAtt.getPositionIncrement());
    +  private void doTestStopwordsPositions(StopFilter stopfilter) throws IOException {
    +    final int NUMBER_OF_TOKENS = 20;
    +    final int DELTA = 3;
    --- End diff --
    
    Given the fact that this was in the original code, and you just refactored, I don't like that much the fact that Delta=3 here is not a method parameter.
    In fact I doubt this method is really usable if the stop filter you pass is not precisely "all but divisible by 3" stop filter.
    I would make this method parametric to make it clearly related the numerical nature of the stop filter in input


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr issue #482: LUCENE-8539: fix some typos and improve style in Tes...

Posted by mikemccand <gi...@git.apache.org>.

Github user mikemccand commented on the issue:

    https://github.com/apache/lucene-solr/pull/482
  
    Great, thanks @diegoceccarelli -- I will merge soon!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #482: LUCENE-8539: fix some typos and improve style...

Posted by alessandrobenedetti <gi...@git.apache.org>.

Github user alessandrobenedetti commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/482#discussion_r227461971
  
    --- Diff: lucene/core/src/test/org/apache/lucene/analysis/TestStopFilter.java ---
    @@ -47,58 +48,65 @@ public void testStopFilt() throws IOException {
         assertTokenStreamContents(stream, new String[] { "Now", "The" });
       }
     
    +
    +  private void logStopwords(String name, List<String> stopwords){
    +    // helper method: converts a list
    +    log(String.format("stopword list \"%s:\"", name));
    +    for (int i = 0; i < stopwords.size(); i++) {
    +      log(String.format("stopword (%d): %s ", i, stopwords.get(i)));
    +    }
    +    log("----------");
    +  }
       /**
        * Test Position increments applied by StopFilter with and without enabling this option.
        */
    -  public void testStopPositons() throws IOException {
    +  public void testStopPositions() throws IOException {
    +    final int NUMBER_OF_TOKENS = 20;
         StringBuilder sb = new StringBuilder();
    -    ArrayList<String> a = new ArrayList<>();
    -    for (int i=0; i<20; i++) {
    -      String w = English.intToEnglish(i).trim();
    -      sb.append(w).append(" ");
    -      if (i%3 != 0) a.add(w);
    +    List<String> stopwords = new ArrayList<>(NUMBER_OF_TOKENS);
    --- End diff --
    
    Maybe renaming sb -> inputText ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #482: LUCENE-8539: fix some typos and improve style...

Posted by alessandrobenedetti <gi...@git.apache.org>.

Github user alessandrobenedetti commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/482#discussion_r227471350
  
    --- Diff: lucene/core/src/test/org/apache/lucene/analysis/TestStopFilter.java ---
    @@ -47,58 +48,65 @@ public void testStopFilt() throws IOException {
         assertTokenStreamContents(stream, new String[] { "Now", "The" });
       }
     
    +
    +  private void logStopwords(String name, List<String> stopwords){
    +    // helper method: converts a list
    +    log(String.format("stopword list \"%s:\"", name));
    +    for (int i = 0; i < stopwords.size(); i++) {
    +      log(String.format("stopword (%d): %s ", i, stopwords.get(i)));
    +    }
    +    log("----------");
    +  }
       /**
        * Test Position increments applied by StopFilter with and without enabling this option.
        */
    -  public void testStopPositons() throws IOException {
    +  public void testStopPositions() throws IOException {
    +    final int NUMBER_OF_TOKENS = 20;
         StringBuilder sb = new StringBuilder();
    -    ArrayList<String> a = new ArrayList<>();
    -    for (int i=0; i<20; i++) {
    -      String w = English.intToEnglish(i).trim();
    -      sb.append(w).append(" ");
    -      if (i%3 != 0) a.add(w);
    +    List<String> stopwords = new ArrayList<>(NUMBER_OF_TOKENS);
    +    for (int i = 0; i < NUMBER_OF_TOKENS; i++) {
    +      String token = English.intToEnglish(i).trim();
    +      sb.append(token).append(' ');
    +      if (i%3 != 0) stopwords.add(token);
         }
         log(sb.toString());
    -    String stopWords[] = a.toArray(new String[0]);
    -    for (int i=0; i<a.size(); i++) log("Stop: "+stopWords[i]);
    -    CharArraySet stopSet = StopFilter.makeStopSet(stopWords);
    +    CharArraySet stopSet = StopFilter.makeStopSet(stopwords);
    +    logStopwords("All stopwords", stopwords);
         // with increments
         StringReader reader = new StringReader(sb.toString());
         final MockTokenizer in = new MockTokenizer(MockTokenizer.WHITESPACE, false);
         in.setReader(reader);
    -    StopFilter stpf = new StopFilter(in, stopSet);
    -    doTestStopPositons(stpf);
    +    StopFilter stopfilter = new StopFilter(in, stopSet);
    +    doTestStopwordsPositions(stopfilter);
         // with increments, concatenating two stop filters
    -    ArrayList<String> a0 = new ArrayList<>();
    -    ArrayList<String> a1 = new ArrayList<>();
    -    for (int i=0; i<a.size(); i++) {
    -      if (i%2==0) { 
    -        a0.add(a.get(i));
    +    List<String> evenStopwords = new ArrayList<>(stopwords.size());
    +    List<String> oddStopwords = new ArrayList<>(stopwords.size());
    +    for (int i=0; i < stopwords.size(); i++) {
    +      if (i%2 == 0) {
    +        evenStopwords.add(stopwords.get(i));
           } else {
    -        a1.add(a.get(i));
    +        oddStopwords.add(stopwords.get(i));
           }
         }
    -    String stopWords0[] =  a0.toArray(new String[0]);
    -    for (int i=0; i<a0.size(); i++) log("Stop0: "+stopWords0[i]);
    -    String stopWords1[] =  a1.toArray(new String[0]);
    -    for (int i=0; i<a1.size(); i++) log("Stop1: "+stopWords1[i]);
    -    CharArraySet stopSet0 = StopFilter.makeStopSet(stopWords0);
    -    CharArraySet stopSet1 = StopFilter.makeStopSet(stopWords1);
    +    CharArraySet evenStopSet = StopFilter.makeStopSet(evenStopwords);
    --- End diff --
    
    I guess the reason behind the naming of "evenStopWords" and "oddStopWords" is because of the way they are extracted from the original stop set.
    But being the original stopwords set all numbers except the ones divisible by 3, what ends up in the even and odd stopword set is not exactly even or odd.
    maybe something like :
    firstStopwordsHalf
    secondStopwordsHalf
    but I lack of fantasy at the moment to identify good names :)
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr issue #482: LUCENE-8539: fix some typos and improve style in Tes...

Posted by diegoceccarelli <gi...@git.apache.org>.

Github user diegoceccarelli commented on the issue:

    https://github.com/apache/lucene-solr/pull/482
  
    Thanks @mikemccand, I added a new test case by splitting one that was testing two different things :) I rebased to the latest master, Is anything else I can do to get this merged? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #482: LUCENE-8539: fix some typos and improve style...

Posted by alessandrobenedetti <gi...@git.apache.org>.

Github user alessandrobenedetti commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/482#discussion_r227462125
  
    --- Diff: lucene/core/src/test/org/apache/lucene/analysis/TestStopFilter.java ---
    @@ -47,58 +48,65 @@ public void testStopFilt() throws IOException {
         assertTokenStreamContents(stream, new String[] { "Now", "The" });
       }
     
    +
    +  private void logStopwords(String name, List<String> stopwords){
    +    // helper method: converts a list
    +    log(String.format("stopword list \"%s:\"", name));
    +    for (int i = 0; i < stopwords.size(); i++) {
    +      log(String.format("stopword (%d): %s ", i, stopwords.get(i)));
    +    }
    +    log("----------");
    +  }
       /**
        * Test Position increments applied by StopFilter with and without enabling this option.
        */
    -  public void testStopPositons() throws IOException {
    +  public void testStopPositions() throws IOException {
    +    final int NUMBER_OF_TOKENS = 20;
         StringBuilder sb = new StringBuilder();
    --- End diff --
    
    Maybe renaming sb -> inputText ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr issue #482: LUCENE-8539: fix some typos and improve style in Tes...

Posted by mikemccand <gi...@git.apache.org>.

Github user mikemccand commented on the issue:

    https://github.com/apache/lucene-solr/pull/482
  
    @diegoceccarelli this looks great -- I'll merge soon.  Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr issue #482: LUCENE-8539: fix some typos and improve style in Tes...

Posted by mikemccand <gi...@git.apache.org>.

Github user mikemccand commented on the issue:

    https://github.com/apache/lucene-solr/pull/482
  
    Hmm `and precommit` is a little angry:
    
    ```
    [forbidden-apis] Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
    [forbidden-apis]   in org.apache.lucene.analysis.TestStopFilter (TestStopFilter.java:61)
    [forbidden-apis] Forbidden method invocation: java.util.Collections#shuffle(java.util.List) [Use shuffle(List, Random) instead so that it can be reproduced]
    [forbidden-apis]   in org.apache.lucene.analysis.TestStopFilter (TestStopFilter.java:130)
    [forbidden-apis] Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
    [forbidden-apis]   in org.apache.lucene.analysis.TestStopFilter (TestStopFilter.java:184)
    [forbidden-apis] Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
    [forbidden-apis]   in org.apache.lucene.analysis.TestStopFilter (TestStopFilter.java:186)
    ```
    
    Can you address these issues and commit / squash?  Thanks @diegoceccarelli!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr issue #482: LUCENE-8539: fix some typos and improve style in Tes...

Posted by diegoceccarelli <gi...@git.apache.org>.

Github user diegoceccarelli commented on the issue:

    https://github.com/apache/lucene-solr/pull/482
  
    Thanks @mikemccand, now precommit seems happy, I squashed and rebased to the latest master. 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org