You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by andyetitmoves <gi...@git.apache.org> on 2014/11/24 01:35:10 UTC

[GitHub] lucene-solr pull request: SOLR-4792: stop shipping a war in 5.0

GitHub user andyetitmoves opened a pull request:

    https://github.com/apache/lucene-solr/pull/107

    SOLR-4792: stop shipping a war in 5.0

    Patch for SOLR-4792.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bloomberg/lucene-solr branch_5x-stop-ship-war

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/107.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #107
    
----
commit 398f74fcf1a5e92d166a6517452f16a040ccb501
Author: Robert Muir <rm...@apache.org>
Date:   2013-06-08T19:08:17Z

    SOLR-4792: stop shipping a war in 5.0

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #107: SOLR-9708 UnifiedHighlighter Solr Plugin

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/107#discussion_r87743142
  
    --- Diff: solr/core/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java ---
    @@ -373,6 +373,11 @@ protected BoundaryScanner getBoundaryScanner(String fieldName, SolrParams params
         if (!isHighlightingEnabled(params)) // also returns early if no unique key field
           return null;
     
    +    boolean rewrite = query != null && !(Boolean.valueOf(params.get(HighlightParams.USE_PHRASE_HIGHLIGHTER, "true")) &&
    --- End diff --
    
    no biggie but I think a simple if(...) condition would be simpler; no variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request: SOLR-4792: stop shipping a war in 5.0

Posted by andyetitmoves <gi...@git.apache.org>.
Github user andyetitmoves closed the pull request at:

    https://github.com/apache/lucene-solr/pull/107


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #107: SOLR-9708 UnifiedHighlighter Solr Plugin

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/107#discussion_r87740491
  
    --- Diff: solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java ---
    @@ -0,0 +1,222 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.highlight;
    +
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.handler.component.HighlightComponent;
    +import org.apache.solr.schema.IndexSchema;
    +import org.junit.BeforeClass;
    +import org.junit.Ignore;
    +
    +/** simple tests for PostingsSolrHighlighter */
    +public class TestUnifiedSolrHighlighter extends SolrTestCaseJ4 {
    +  
    +  @BeforeClass
    +  public static void beforeClass() throws Exception {
    +    initCore("solrconfig-unifiedhighlight.xml", "schema-unifiedhighlight.xml");
    +    
    +    // test our config is sane, just to be sure:
    +    
    +    // postingshighlighter should be used
    +    SolrHighlighter highlighter = HighlightComponent.getHighlighter(h.getCore());
    +    assertTrue("wrong highlighter: " + highlighter.getClass(), highlighter instanceof UnifiedSolrHighlighter);
    +    
    +    // 'text' and 'text3' should have offsets, 'text2' should not
    +    IndexSchema schema = h.getCore().getLatestSchema();
    +    assertTrue(schema.getField("text").storeOffsetsWithPositions());
    +    assertTrue(schema.getField("text3").storeOffsetsWithPositions());
    +    assertFalse(schema.getField("text2").storeOffsetsWithPositions());
    +  }
    +  
    +  @Override
    +  public void setUp() throws Exception {
    +    super.setUp();
    +    clearIndex();
    +    assertU(adoc("text", "document one", "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(adoc("text", "second document", "text2", "second document", "text3", "crappier document", "id", "102"));
    +    assertU(commit());
    +  }
    +  
    +  public void testSimple() {
    +    assertQ("simplest test", 
    +        req("q", "text:document", "sort", "id asc", "hl", "true"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'");
    +  }
    +
    +  public void testMultipleSnippetsReturned() {
    +    clearIndex();
    +    assertU(adoc("text", "Document snippet one. Intermediate sentence. Document snippet two.",
    +        "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(commit());
    +    assertQ("multiple snippets test",
    +        req("q", "text:document", "sort", "id asc", "hl", "true", "hl.snippets", "2", "hl.bs.type", "SENTENCE"),
    +        "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Document</em> snippet one. '",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr/str[2]='<em>Document</em> snippet two.'");
    +  }
    +
    +  public void testStrictPhrasesEnabledByDefault() {
    +    clearIndex();
    +    assertU(adoc("text", "Strict phrases should be enabled for phrases",
    +        "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(commit());
    +    assertQ("strict phrase handling",
    +        req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true"),
    +        "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be enabled for phrases'");
    +  }
    +
    +  public void testStrictPhrasesCanBeDisabled() {
    +    clearIndex();
    +    assertU(adoc("text", "Strict phrases should be disabled for phrases",
    +        "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(commit());
    +    assertQ("strict phrase handling",
    +        req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true", "hl.usePhraseHighlighter", "false"),
    +        "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be disabled for <em>phrases</em>'");
    +  }
    +
    +  public void testMultiTermQueryEnabledByDefault() {
    +    clearIndex();
    +    assertU(adoc("text", "Aviary Avenue document",
    +        "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(commit());
    +    assertQ("multi term query handling",
    +        req("q", "text:av*", "sort", "id asc", "hl", "true"),
    +        "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Aviary</em> <em>Avenue</em> document'");
    +  }
    +
    +  public void testMultiTermQueryCanBeDisabled() {
    +    clearIndex();
    +    assertU(adoc("text", "Aviary Avenue document",
    +        "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(commit());
    +    assertQ("multi term query handling",
    +        req("q", "text:av*", "sort", "id asc", "hl", "true", "hl.highlightMultiTerm", "false"),
    +        "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=0");
    +  }
    +
    +  public void testPagination() {
    +    assertQ("pagination test", 
    +        req("q", "text:document", "sort", "id asc", "hl", "true", "rows", "1", "start", "1"),
    +        "count(//lst[@name='highlighting']/*)=1",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'");
    +  }
    +  
    +  public void testEmptySnippet() {
    +    assertQ("null snippet test", 
    +      req("q", "text:one OR *:*", "sort", "id asc", "hl", "true"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'",
    +        "count(//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/*)=0");
    +  }
    +  
    +  public void testDefaultSummary() {
    +    assertQ("null snippet test", 
    +      req("q", "text:one OR *:*", "sort", "id asc", "hl", "true", "hl.defaultSummary", "true"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second document'");
    +  }
    +  
    +  public void testDifferentField() {
    +    assertQ("highlighting text3", 
    +        req("q", "text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text3"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'");
    +  }
    +  
    +  public void testTwoFields() {
    +    assertQ("highlighting text and text3", 
    +        req("q", "text:document text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text,text3"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'");
    +  }
    +
    +  //todo: need to configure field that is not at least stored, hence no analysis
    +  //otherwise, this highlighter is resilient
    +  @Ignore
    --- End diff --
    
    It seems this test should be dropped.  Indeed, this highlighter is resilient; it just needs to be stored.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #107: SOLR-9708 UnifiedHighlighter Solr Plugin

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/107#discussion_r89039004
  
    --- Diff: solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java ---
    @@ -184,6 +185,20 @@ public void process(ResponseBuilder rb) throws IOException {
         }
       }
     
    +  /**
    +   * Normalizes parameters between highlighters
    +   */
    +  private SolrParams normalizeParameters(SolrParams params) {
    --- End diff --
    
    You've coded this such that SIMPLE_PRE overrides TAG_PRE which is not what we want I think?  Furthermore, this is coded such that it only overrides at the global level which won't work for field-specific settings like `f.myfieldname.hl.tag.pre` which we'd want to examine `f.myfieldname.hl.simple.pre`.  I appreciate where you were going with this, but in light of the latter point, I think you should simply modify the Solr UH adapter to lookup say "pre" like so:
    
        String preTag = params.getFieldParam(fieldName, HighlightParams.TAG_PRE, 
               params.getFieldParam(fieldName, HighlightParams.SIMPLE_PRE, "<em>");
         );



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #107: SOLR-9708 UnifiedHighlighter Solr Plugin

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/107#discussion_r87740728
  
    --- Diff: solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java ---
    @@ -0,0 +1,222 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.highlight;
    +
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.handler.component.HighlightComponent;
    +import org.apache.solr.schema.IndexSchema;
    +import org.junit.BeforeClass;
    +import org.junit.Ignore;
    +
    +/** simple tests for PostingsSolrHighlighter */
    --- End diff --
    
    obsolete reference to PostingsSolrHighlighter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #107: SOLR-9708 UnifiedHighlighter Solr Plugin

Posted by Timothy055 <gi...@git.apache.org>.
Github user Timothy055 commented on the issue:

    https://github.com/apache/lucene-solr/pull/107
  
    I've made some more updates to the documentation for the pre and post parameters as well as fixed the defaulting logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #107: SOLR-9708 UnifiedHighlighter Solr Plugin

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/107#discussion_r87740749
  
    --- Diff: solr/core/src/test-files/solr/collection1/conf/solrconfig-unifiedhighlight.xml ---
    @@ -0,0 +1,35 @@
    +<?xml version="1.0" ?>
    +
    +<!--
    + Licensed to the Apache Software Foundation (ASF) under one or more
    + contributor license agreements.  See the NOTICE file distributed with
    + this work for additional information regarding copyright ownership.
    + The ASF licenses this file to You under the Apache License, Version 2.0
    + (the "License"); you may not use this file except in compliance with
    + the License.  You may obtain a copy of the License at
    +
    +     http://www.apache.org/licenses/LICENSE-2.0
    +
    + Unless required by applicable law or agreed to in writing, software
    + distributed under the License is distributed on an "AS IS" BASIS,
    + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + See the License for the specific language governing permissions and
    + limitations under the License.
    +-->
    +
    +<!-- a basic solrconfig for postings highlighter -->
    --- End diff --
    
    obsolete postings highlighter reference


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #107: SOLR-9708 UnifiedHighlighter Solr Plugin

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/107#discussion_r87741032
  
    --- Diff: solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java ---
    @@ -0,0 +1,366 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.highlight;
    +
    +import java.io.IOException;
    +import java.text.BreakIterator;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Locale;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import org.apache.lucene.document.Document;
    +import org.apache.lucene.search.DocIdSetIterator;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.postingshighlight.WholeBreakIterator;
    +import org.apache.lucene.search.uhighlight.DefaultPassageFormatter;
    +import org.apache.lucene.search.uhighlight.PassageFormatter;
    +import org.apache.lucene.search.uhighlight.PassageScorer;
    +import org.apache.lucene.search.uhighlight.UnifiedHighlighter;
    +import org.apache.solr.common.params.HighlightParams;
    +import org.apache.solr.common.params.SolrParams;
    +import org.apache.solr.common.util.NamedList;
    +import org.apache.solr.common.util.SimpleOrderedMap;
    +import org.apache.solr.core.PluginInfo;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.request.SolrRequestInfo;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.util.RTimerTree;
    +import org.apache.solr.util.plugin.PluginInfoInitialized;
    +
    +/*
    + * TODO: The HighlightComponent should not call rewrite on the query; it should be up to the
    + * SolrHighlighter to do if needed.  Furthermore this arrangement is odd -- why are these abstractions separate?
    + */
    +
    +/**
    + * Highlighter impl that uses {@link UnifiedHighlighter}
    + * <p>
    + * Example configuration with default values:
    + * <pre class="prettyprint">
    + * &lt;requestHandler name="standard" class="solr.StandardRequestHandler"&gt;
    + * &lt;lst name="defaults"&gt;
    + * &lt;int name="hl.snippets"&gt;1&lt;/int&gt;
    + * &lt;str name="hl.tag.pre"&gt;&amp;lt;em&amp;gt;&lt;/str&gt;
    + * &lt;str name="hl.tag.post"&gt;&amp;lt;/em&amp;gt;&lt;/str&gt;
    + * &lt;str name="hl.tag.ellipsis"&gt;... &lt;/str&gt;
    + * &lt;bool name="hl.defaultSummary"&gt;true&lt;/bool&gt;
    + * &lt;str name="hl.encoder"&gt;simple&lt;/str&gt;
    + * &lt;float name="hl.score.k1"&gt;1.2&lt;/float&gt;
    + * &lt;float name="hl.score.b"&gt;0.75&lt;/float&gt;
    + * &lt;float name="hl.score.pivot"&gt;87&lt;/float&gt;
    + * &lt;str name="hl.bs.language"&gt;&lt;/str&gt;
    + * &lt;str name="hl.bs.country"&gt;&lt;/str&gt;
    + * &lt;str name="hl.bs.variant"&gt;&lt;/str&gt;
    + * &lt;str name="hl.bs.type"&gt;SENTENCE&lt;/str&gt;
    + * &lt;int name="hl.maxAnalyzedChars"&gt;10000&lt;/int&gt;
    + * &lt;bool name="hl.highlightMultiTerm"&gt;true&lt;/bool&gt;
    + * &lt;/lst&gt;
    + * &lt;/requestHandler&gt;
    + * </pre>
    + * ...
    + * <pre class="prettyprint">
    + * &lt;searchComponent class="solr.HighlightComponent" name="highlight"&gt;
    + * &lt;highlighting class="org.apache.solr.highlight.UnifiedSolrHighlighter"/&gt;
    + * &lt;/searchComponent&gt;
    + * </pre>
    + * <p>
    + * Notes:
    + * <ul>
    + * <li>hl.q (string) can specify the query
    + * <li>hl.fl (string) specifies the field list.
    + * <li>hl.snippets (int) specifies how many snippets to return.
    + * <li>hl.tag.pre (string) specifies text which appears before a highlighted term.
    + * <li>hl.tag.post (string) specifies text which appears after a highlighted term.
    + * <li>hl.tag.ellipsis (string) specifies text which joins non-adjacent passages. The default is to retain each
    + * value in a list without joining them.
    + * <li>hl.defaultSummary (bool) specifies if a field should have a default summary of the leading text.
    + * <li>hl.encoder (string) can be 'html' (html escapes content) or 'simple' (no escaping).
    + * <li>hl.score.k1 (float) specifies bm25 scoring parameter 'k1'
    + * <li>hl.score.b (float) specifies bm25 scoring parameter 'b'
    + * <li>hl.score.pivot (float) specifies bm25 scoring parameter 'avgdl'
    + * <li>hl.bs.type (string) specifies how to divide text into passages: [SENTENCE, LINE, WORD, CHAR, WHOLE]
    + * <li>hl.bs.language (string) specifies language code for BreakIterator. default is empty string (root locale)
    + * <li>hl.bs.country (string) specifies country code for BreakIterator. default is empty string (root locale)
    + * <li>hl.bs.variant (string) specifies country code for BreakIterator. default is empty string (root locale)
    + * <li>hl.maxAnalyzedChars specifies how many characters at most will be processed in a document for any one field.
    + * <li>hl.highlightMultiTerm enables highlighting for range/wildcard/fuzzy/prefix queries at some cost.
    + * <li>hl.usePhraseHighlighter (bool) enables highlighting phrases and some other queries strictly at some cost.</li>
    --- End diff --
    
    We know this is actually faster.  So I think we can remove the reference here.  But I think we forgot it in the list above to indicate it's true by default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #107: SOLR-9708 UnifiedHighlighter Solr Plugin

Posted by Timothy055 <gi...@git.apache.org>.
Github user Timothy055 commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/107#discussion_r89227253
  
    --- Diff: solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java ---
    @@ -184,6 +185,20 @@ public void process(ResponseBuilder rb) throws IOException {
         }
       }
     
    +  /**
    +   * Normalizes parameters between highlighters
    +   */
    +  private SolrParams normalizeParameters(SolrParams params) {
    --- End diff --
    
    Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #107: SOLR-9708 UnifiedHighlighter Solr Plugin

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/107#discussion_r87740550
  
    --- Diff: solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java ---
    @@ -0,0 +1,222 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.highlight;
    +
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.handler.component.HighlightComponent;
    +import org.apache.solr.schema.IndexSchema;
    +import org.junit.BeforeClass;
    +import org.junit.Ignore;
    +
    +/** simple tests for PostingsSolrHighlighter */
    +public class TestUnifiedSolrHighlighter extends SolrTestCaseJ4 {
    +  
    +  @BeforeClass
    +  public static void beforeClass() throws Exception {
    +    initCore("solrconfig-unifiedhighlight.xml", "schema-unifiedhighlight.xml");
    +    
    +    // test our config is sane, just to be sure:
    +    
    +    // postingshighlighter should be used
    +    SolrHighlighter highlighter = HighlightComponent.getHighlighter(h.getCore());
    +    assertTrue("wrong highlighter: " + highlighter.getClass(), highlighter instanceof UnifiedSolrHighlighter);
    +    
    +    // 'text' and 'text3' should have offsets, 'text2' should not
    +    IndexSchema schema = h.getCore().getLatestSchema();
    +    assertTrue(schema.getField("text").storeOffsetsWithPositions());
    +    assertTrue(schema.getField("text3").storeOffsetsWithPositions());
    +    assertFalse(schema.getField("text2").storeOffsetsWithPositions());
    +  }
    +  
    +  @Override
    +  public void setUp() throws Exception {
    +    super.setUp();
    +    clearIndex();
    +    assertU(adoc("text", "document one", "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(adoc("text", "second document", "text2", "second document", "text3", "crappier document", "id", "102"));
    +    assertU(commit());
    +  }
    +  
    +  public void testSimple() {
    +    assertQ("simplest test", 
    +        req("q", "text:document", "sort", "id asc", "hl", "true"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'");
    +  }
    +
    +  public void testMultipleSnippetsReturned() {
    +    clearIndex();
    +    assertU(adoc("text", "Document snippet one. Intermediate sentence. Document snippet two.",
    +        "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(commit());
    +    assertQ("multiple snippets test",
    +        req("q", "text:document", "sort", "id asc", "hl", "true", "hl.snippets", "2", "hl.bs.type", "SENTENCE"),
    +        "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Document</em> snippet one. '",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr/str[2]='<em>Document</em> snippet two.'");
    +  }
    +
    +  public void testStrictPhrasesEnabledByDefault() {
    +    clearIndex();
    +    assertU(adoc("text", "Strict phrases should be enabled for phrases",
    +        "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(commit());
    +    assertQ("strict phrase handling",
    +        req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true"),
    +        "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be enabled for phrases'");
    +  }
    +
    +  public void testStrictPhrasesCanBeDisabled() {
    +    clearIndex();
    +    assertU(adoc("text", "Strict phrases should be disabled for phrases",
    +        "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(commit());
    +    assertQ("strict phrase handling",
    +        req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true", "hl.usePhraseHighlighter", "false"),
    +        "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be disabled for <em>phrases</em>'");
    +  }
    +
    +  public void testMultiTermQueryEnabledByDefault() {
    +    clearIndex();
    +    assertU(adoc("text", "Aviary Avenue document",
    +        "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(commit());
    +    assertQ("multi term query handling",
    +        req("q", "text:av*", "sort", "id asc", "hl", "true"),
    +        "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Aviary</em> <em>Avenue</em> document'");
    +  }
    +
    +  public void testMultiTermQueryCanBeDisabled() {
    +    clearIndex();
    +    assertU(adoc("text", "Aviary Avenue document",
    +        "text2", "document one", "text3", "crappy document", "id", "101"));
    +    assertU(commit());
    +    assertQ("multi term query handling",
    +        req("q", "text:av*", "sort", "id asc", "hl", "true", "hl.highlightMultiTerm", "false"),
    +        "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=0");
    +  }
    +
    +  public void testPagination() {
    +    assertQ("pagination test", 
    +        req("q", "text:document", "sort", "id asc", "hl", "true", "rows", "1", "start", "1"),
    +        "count(//lst[@name='highlighting']/*)=1",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'");
    +  }
    +  
    +  public void testEmptySnippet() {
    +    assertQ("null snippet test", 
    +      req("q", "text:one OR *:*", "sort", "id asc", "hl", "true"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'",
    +        "count(//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/*)=0");
    +  }
    +  
    +  public void testDefaultSummary() {
    +    assertQ("null snippet test", 
    +      req("q", "text:one OR *:*", "sort", "id asc", "hl", "true", "hl.defaultSummary", "true"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second document'");
    +  }
    +  
    +  public void testDifferentField() {
    +    assertQ("highlighting text3", 
    +        req("q", "text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text3"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'");
    +  }
    +  
    +  public void testTwoFields() {
    +    assertQ("highlighting text and text3", 
    +        req("q", "text:document text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text,text3"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'");
    +  }
    +
    +  //todo: need to configure field that is not at least stored, hence no analysis
    +  //otherwise, this highlighter is resilient
    +  @Ignore
    +  public void testMisconfiguredField() {
    +    ignoreException("was indexed without offsets");
    +    try {
    +      assertQ("should fail, has no offsets",
    +        req("q", "text2:document", "sort", "id asc", "hl", "true", "hl.fl", "text2"));
    +      fail();
    +    } catch (Exception expected) {
    +      // expected
    +    }
    +    resetExceptionIgnores();
    +  }
    +  
    +  public void testTags() {
    +    assertQ("different pre/post tags", 
    +        req("q", "text:document", "sort", "id asc", "hl", "true", "hl.tag.pre", "[", "hl.tag.post", "]"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='[document] one'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second [document]'");
    +  }
    +  
    +  public void testTagsPerField() {
    +    assertQ("highlighting text and text3", 
    +        req("q", "text:document text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text,text3", "f.text3.hl.tag.pre", "[", "f.text3.hl.tag.post", "]"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy [document]'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier [document]'");
    +  }
    +  
    +  public void testBreakIterator() {
    +    assertQ("different breakiterator", 
    +        req("q", "text:document", "sort", "id asc", "hl", "true", "hl.bs.type", "WORD"),
    +        "count(//lst[@name='highlighting']/*)=2",
    +        "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em>'",
    +        "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='<em>document</em>'");
    +  }
    +  
    +  public void testBreakIterator2() {
    +    assertU(adoc("text", "Document one has a first sentence. Document two has a second sentence.", "id", "103"));
    +    assertU(commit());
    +    assertQ("different breakiterator", 
    +        req("q", "text:document", "sort", "id asc", "hl", "true", "hl.bs.type", "WHOLE"),
    +        "//lst[@name='highlighting']/lst[@name='103']/arr[@name='text']/str='<em>Document</em> one has a first sentence. <em>Document</em> two has a second sentence.'");
    +  }
    +  
    +  public void testEncoder() {
    +    assertU(adoc("text", "Document one has a first <i>sentence</i>.", "id", "103"));
    +    assertU(commit());
    +    assertQ("html escaped", 
    +        req("q", "text:document", "sort", "id asc", "hl", "true", "hl.encoder", "html"),
    +        "//lst[@name='highlighting']/lst[@name='103']/arr[@name='text']/str='<em>Document</em>&#32;one&#32;has&#32;a&#32;first&#32;&lt;i&gt;sentence&lt;&#x2F;i&gt;&#46;'");
    +  }
    +  
    +  public void testWildcard() {
    --- End diff --
    
    This test is obsoleted by ones you added above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #107: SOLR-9708 UnifiedHighlighter Solr Plugin

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/107#discussion_r87738726
  
    --- Diff: solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java ---
    @@ -0,0 +1,366 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.highlight;
    +
    +import java.io.IOException;
    +import java.text.BreakIterator;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Locale;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import org.apache.lucene.document.Document;
    +import org.apache.lucene.search.DocIdSetIterator;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.postingshighlight.WholeBreakIterator;
    +import org.apache.lucene.search.uhighlight.DefaultPassageFormatter;
    +import org.apache.lucene.search.uhighlight.PassageFormatter;
    +import org.apache.lucene.search.uhighlight.PassageScorer;
    +import org.apache.lucene.search.uhighlight.UnifiedHighlighter;
    +import org.apache.solr.common.params.HighlightParams;
    +import org.apache.solr.common.params.SolrParams;
    +import org.apache.solr.common.util.NamedList;
    +import org.apache.solr.common.util.SimpleOrderedMap;
    +import org.apache.solr.core.PluginInfo;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.request.SolrRequestInfo;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.util.RTimerTree;
    +import org.apache.solr.util.plugin.PluginInfoInitialized;
    +
    +/*
    + * TODO: The HighlightComponent should not call rewrite on the query; it should be up to the
    + * SolrHighlighter to do if needed.  Furthermore this arrangement is odd -- why are these abstractions separate?
    + */
    +
    +/**
    + * Highlighter impl that uses {@link UnifiedHighlighter}
    + * <p>
    + * Example configuration with default values:
    + * <pre class="prettyprint">
    + * &lt;requestHandler name="standard" class="solr.StandardRequestHandler"&gt;
    + * &lt;lst name="defaults"&gt;
    + * &lt;int name="hl.snippets"&gt;1&lt;/int&gt;
    + * &lt;str name="hl.tag.pre"&gt;&amp;lt;em&amp;gt;&lt;/str&gt;
    + * &lt;str name="hl.tag.post"&gt;&amp;lt;/em&amp;gt;&lt;/str&gt;
    + * &lt;str name="hl.tag.ellipsis"&gt;... &lt;/str&gt;
    + * &lt;bool name="hl.defaultSummary"&gt;true&lt;/bool&gt;
    + * &lt;str name="hl.encoder"&gt;simple&lt;/str&gt;
    + * &lt;float name="hl.score.k1"&gt;1.2&lt;/float&gt;
    + * &lt;float name="hl.score.b"&gt;0.75&lt;/float&gt;
    + * &lt;float name="hl.score.pivot"&gt;87&lt;/float&gt;
    + * &lt;str name="hl.bs.language"&gt;&lt;/str&gt;
    + * &lt;str name="hl.bs.country"&gt;&lt;/str&gt;
    + * &lt;str name="hl.bs.variant"&gt;&lt;/str&gt;
    + * &lt;str name="hl.bs.type"&gt;SENTENCE&lt;/str&gt;
    + * &lt;int name="hl.maxAnalyzedChars"&gt;10000&lt;/int&gt;
    + * &lt;bool name="hl.highlightMultiTerm"&gt;true&lt;/bool&gt;
    + * &lt;/lst&gt;
    + * &lt;/requestHandler&gt;
    + * </pre>
    + * ...
    + * <pre class="prettyprint">
    + * &lt;searchComponent class="solr.HighlightComponent" name="highlight"&gt;
    + * &lt;highlighting class="org.apache.solr.highlight.UnifiedSolrHighlighter"/&gt;
    + * &lt;/searchComponent&gt;
    + * </pre>
    + * <p>
    + * Notes:
    + * <ul>
    + * <li>hl.q (string) can specify the query
    + * <li>hl.fl (string) specifies the field list.
    + * <li>hl.snippets (int) specifies how many snippets to return.
    + * <li>hl.tag.pre (string) specifies text which appears before a highlighted term.
    + * <li>hl.tag.post (string) specifies text which appears after a highlighted term.
    + * <li>hl.tag.ellipsis (string) specifies text which joins non-adjacent passages. The default is to retain each
    + * value in a list without joining them.
    + * <li>hl.defaultSummary (bool) specifies if a field should have a default summary of the leading text.
    + * <li>hl.encoder (string) can be 'html' (html escapes content) or 'simple' (no escaping).
    + * <li>hl.score.k1 (float) specifies bm25 scoring parameter 'k1'
    + * <li>hl.score.b (float) specifies bm25 scoring parameter 'b'
    + * <li>hl.score.pivot (float) specifies bm25 scoring parameter 'avgdl'
    + * <li>hl.bs.type (string) specifies how to divide text into passages: [SENTENCE, LINE, WORD, CHAR, WHOLE]
    + * <li>hl.bs.language (string) specifies language code for BreakIterator. default is empty string (root locale)
    + * <li>hl.bs.country (string) specifies country code for BreakIterator. default is empty string (root locale)
    + * <li>hl.bs.variant (string) specifies country code for BreakIterator. default is empty string (root locale)
    + * <li>hl.maxAnalyzedChars specifies how many characters at most will be processed in a document for any one field.
    + * <li>hl.highlightMultiTerm enables highlighting for range/wildcard/fuzzy/prefix queries at some cost.
    + * <li>hl.usePhraseHighlighter (bool) enables highlighting phrases and some other queries strictly at some cost.</li>
    + * </ul>
    + * TODO add hl.method, hl.cacheFieldValCharsThreshold
    + *
    + * @lucene.experimental
    + */
    +public class UnifiedSolrHighlighter extends SolrHighlighter implements PluginInfoInitialized {
    +
    +    protected static final String SNIPPET_SEPARATOR = "\u0000";
    +    private static final String[] ZERO_LEN_STR_ARRAY = new String[0];
    +
    +    //TODO move to Solr HighlightParams
    --- End diff --
    
    These TODOs should be addressed (and note corresponding docs TODO on line 109).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org