You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ep...@apache.org on 2020/06/17 15:02:16 UTC

[lucene-solr] branch master updated: SOLR-14572 document missing SearchComponents (#1581)

This is an automated email from the ASF dual-hosted git repository.

epugh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/master by this push:
     new 207efbc  SOLR-14572 document missing SearchComponents (#1581)
207efbc is described below

commit 207efbceeb2fbf977f62516d7dcd9cae4c9d4e67
Author: Eric Pugh <ep...@opensourceconnections.com>
AuthorDate: Wed Jun 17 11:01:49 2020 -0400

    SOLR-14572 document missing SearchComponents (#1581)
    
    * Add an example explaining how to use
    
    * fix up JavaDoc formatting
    
    * add missing SearchComponents that ship with Solr, and point to external site with components.
    
    * fix path
    
    * simplify page layout by consolidating to lists
    
    * add missing components that are documented elsewhere in refguide
    
    * try to get pathing to pass precommit
    
    * remove mention of solr.cool, in favour of a seperate PR that handles it differently
---
 .../update/processor/URLClassifyProcessor.java     | 56 ++++++++++++++++++++++
 ...andlers-and-searchcomponents-in-solrconfig.adoc |  9 ++++
 2 files changed, 65 insertions(+)

diff --git a/solr/core/src/java/org/apache/solr/update/processor/URLClassifyProcessor.java b/solr/core/src/java/org/apache/solr/update/processor/URLClassifyProcessor.java
index a3697e2..9d727c7 100644
--- a/solr/core/src/java/org/apache/solr/update/processor/URLClassifyProcessor.java
+++ b/solr/core/src/java/org/apache/solr/update/processor/URLClassifyProcessor.java
@@ -33,14 +33,70 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 /**
+ * <p>
  * Update processor which examines a URL and outputs to various other fields
  * characteristics of that URL, including length, number of path levels, whether
  * it is a top level URL (levels==0), whether it looks like a landing/index page,
  * a canonical representation of the URL (e.g. stripping index.html), the domain
  * and path parts of the URL etc.
+ * </p>
+ *
  * <p>
  * This processor is intended used in connection with processing web resources,
  * and helping to produce values which may be used for boosting or filtering later.
+ * </p>
+ *
+ * <p>
+ * In the example configuration below, we construct a custom
+ * <code>updateRequestProcessorChain</code> and then instruct the
+ * <code>/update</code> requesthandler to use it for every incoming document.
+ * </p>
+ * <pre class="prettyprint">
+ * &lt;updateRequestProcessorChain name="urlProcessor"&gt;
+ *   &lt;processor class="org.apache.solr.update.processor.URLClassifyProcessorFactory"&gt;
+ *     &lt;bool name="enabled"&gt;true&lt;/bool&gt;
+ *     &lt;str name="inputField"&gt;id&lt;/str&gt;
+ *     &lt;str name="domainOutputField"&gt;hostname&lt;/str&gt;
+ *   &lt;/processor&gt;
+ *   &lt;processor class="solr.RunUpdateProcessorFactory" /&gt;
+ * &lt;/updateRequestProcessorChain&gt;
+ *
+ * &lt;requestHandler name="/update" class="solr.UpdateRequestHandler"&gt;
+ * &lt;lst name="defaults"&gt;
+ * &lt;str name="update.chain"&gt;urlProcessor&lt;/str&gt;
+ * &lt;/lst&gt;
+ * &lt;/requestHandler&gt;
+ * </pre>
+ * <p>
+ * Then, at index time, Solr will look at the <code>id</code> field value and extract
+ * it's domain portion into a new <code>hostname</code> field. By default, the
+ * following fields will also be added:
+ * </p>
+ * <ul>
+ *  <li>url_length</li>
+ *  <li>url_levels</li>
+ *  <li>url_toplevel</li>
+ *  <li>url_landingpage</li>
+ * </ul>
+ * <p>
+ * For example, adding the following document
+ * <pre class="prettyprint">
+ * { "id":"http://wwww.mydomain.com/subpath/document.html" }
+ * </pre>
+ * <p>
+ * will result in this document in Solr:
+ * </p>
+ * <pre class="prettyprint">
+ * {
+ *  "id":"http://wwww.mydomain.com/subpath/document.html",
+ *  "url_length":["46"],
+ *  "url_levels":["2"],
+ *  "url_toplevel":["0"],
+ *  "url_landingpage":["0"],
+ *  "hostname":["wwww.mydomain.com"],
+ *  "_version_":1603193062117343232}]
+ * }
+ * </pre>
  */
 public class URLClassifyProcessor extends UpdateRequestProcessor {
 
diff --git a/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc b/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
index fab9606..52e2788 100644
--- a/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
+++ b/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc
@@ -169,3 +169,12 @@ Many of the other useful components are described in sections of this Guide for
 * `TermVectorComponent`, described in the section <<the-term-vector-component.adoc#the-term-vector-component,The Term Vector Component>>.
 * `QueryElevationComponent`, described in the section <<the-query-elevation-component.adoc#the-query-elevation-component,The Query Elevation Component>>.
 * `TermsComponent`, described in the section <<the-terms-component.adoc#the-terms-component,The Terms Component>>.
+* `RealTimeGetComponent`, described in the section <<realtime-get.adoc#realtime-get,RealTime Get>>.
+* `ClusteringComponent`, described in the section <<result-clustering.adoc#result-clustering,Result Clustering>>.
+* `SuggestComponent`, described in the section <<suggester.adoc#suggester,Suggester>>.
+* `AnalyticsComponent`, described in the section <<analytics.adoc#analytics,Analytics>>.
+
+Other components that ship with Solr include:
+
+* `ResponseLogComponent`, used to record which documents are returned to the user via the Solr log, described in the {solr-javadocs}solr-core/org/apache/solr/handler/component/ResponseLogComponent.html[ResponseLogComponent] javadocs.
+* `PhrasesIdentificationComponent`, used to identify & score "phrases" found in the input string, based on shingles in indexed fields, described in the {solr-javadocs}solr-core/org/apache/solr/handler/component/PhrasesIdentificationComponent.html[PhrasesIdentificationComponent] javadocs.