You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@solr.apache.org by ep...@apache.org on 2021/12/15 19:50:33 UTC
[solr] branch main updated: SOLR-15834: Films example readme needs updating and including useParams mention (#444)

This is an automated email from the ASF dual-hosted git repository.

epugh pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/solr.git


The following commit(s) were added to refs/heads/main by this push:
     new 64e8917  SOLR-15834: Films example readme needs updating and including useParams mention (#444)
64e8917 is described below

commit 64e891707f8d9d399f67efd8536e72df8034f89c
Author: Eric Pugh <ep...@opensourceconnections.com>
AuthorDate: Wed Dec 15 14:50:25 2021 -0500

    SOLR-15834: Films example readme needs updating and including useParams mention (#444)
    
    Update the Readme and the SolrCLI.java command to demonstrate adding a films collection with two ParamSets and picking between them with the useParams query parameter.
---
 solr/CHANGES.txt                                   |   2 +
 .../src/java/org/apache/solr/util/SolrCLI.java     |  25 ++++-
 solr/example/films/README.md                       | 108 +++++++++++++--------
 3 files changed, 90 insertions(+), 45 deletions(-)

diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt
index 36cbe7a..1da35bd 100644
--- a/solr/CHANGES.txt
+++ b/solr/CHANGES.txt
@@ -181,6 +181,8 @@ when told to. The admin UI now tells it to. (Nazerke Seidan, David Smiley)
 
 * SOLR-15786: Add the "films" example to SolrCLI via -e films parameter. (Eric Pugh)
 
+* SOLR-15834: Films example readme needs updating, including useParams support for multiple algorithms.  (Eric Pugh) 
+
 Build
 ---------------------
 
diff --git a/solr/core/src/java/org/apache/solr/util/SolrCLI.java b/solr/core/src/java/org/apache/solr/util/SolrCLI.java
index 5128501..3f5d36b 100755
--- a/solr/core/src/java/org/apache/solr/util/SolrCLI.java
+++ b/solr/core/src/java/org/apache/solr/util/SolrCLI.java
@@ -2860,9 +2860,9 @@ public class SolrCLI implements CLIO {
         }
       }
       else if ("films".equals(exampleName) && !alreadyExists) {
-        echo("Adding name and initial_release_data fields to films schema \"_default\"");
-
         HttpSolrClient solrClient = new HttpSolrClient.Builder(solrUrl).build();
+
+        echo("Adding name and initial_release_data fields to films schema \"_default\"");
         try {
           SolrCLI.postJsonToSolr(solrClient, "/" + collectionName + "/schema", "{\n" +
                   "        \"add-field\" : {\n" +
@@ -2881,6 +2881,27 @@ public class SolrCLI implements CLIO {
           throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, ex);
         }
 
+        echo("Adding paramsets \"algo\" and \"algo_b\" to films configuration for relevancy tuning");
+        try {
+          SolrCLI.postJsonToSolr(solrClient, "/" + collectionName + "/config/params", "{\n" +
+                  "        \"set\": {\n" +
+                  "        \"algo_a\":{\n" +
+                  "               \"defType\":\"dismax\",\n" +
+                  "               \"qf\":\"name\"\n" +
+                  "             }\n" +
+                  "           },\n" +
+                  "           \"set\": {\n" +
+                  "             \"algo_b\":{\n" +
+                  "               \"defType\":\"dismax\",\n" +
+                  "               \"qf\":\"name\",\n" +
+                  "               \"mm\":\"100%\"\n" +
+                  "             }\n" +
+                  "            }\n" +
+                  "        }\n");
+        } catch (Exception ex) {
+          throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, ex);
+        }
+
         File filmsJsonFile = new File(exampleDir, "films/films.json");
         String updateUrl = String.format(Locale.ROOT, "%s/%s/update/json", solrUrl, collectionName);
         echo("Indexing films example docs from " + filmsJsonFile.getAbsolutePath());
diff --git a/solr/example/films/README.md b/solr/example/films/README.md
index bd4bf82..2ed1713 100644
--- a/solr/example/films/README.md
+++ b/solr/example/films/README.md
@@ -1,20 +1,22 @@
 We have a movie data set in JSON, Solr XML, and CSV formats.  All 3 formats contain the same data.  You can use any one format to index documents to Solr.
 
-This example uses the _default configset that ships with Solr plus some custom fields added via Schema API.
+This example uses the `_default` configset that ships with Solr plus some custom fields added via Schema API.  It demonstrates the use of ParamSets in conjunction with the [Request Parameters API](https://solr.apache.org/guide/request-parameters-api.html).
 
-The data is was fetched from Freebase and the data license is present in the films-LICENSE.txt file.  Freebase was shutdown in 2016 by Google.
+The original data was fetched from Freebase and the data license is present in the films-LICENSE.txt file.  Freebase was shutdown in 2016 by Google.
 
 This data consists of the following fields:
- * "id" - unique identifier for the movie
- * "name" - Name of the movie
- * "directed_by" - The person(s) who directed the making of the film
- * "initial_release_date" - The earliest official initial film screening date in any country
- * "genre" - The genre(s) that the movie belongs to
+ * `id` - unique identifier for the movie
+ * `name` - Name of the movie
+ * `directed_by` - The person(s) who directed the making of the film
+ * `initial_release_date` - The earliest official initial film screening date in any country
+ * `genre` - The genre(s) that the movie belongs to
 
- The "name" and "initial_release_date" are created via the Schema API, and the "genre" and "direct_by" fields
- are created by the use of an Update Request Processor Change called "add-unknown-fields-to-the-schema".
+ The `name` and `initial_release_date` are created via the Schema API, and the `genre` and `direct_by` fields
+ are created by the use of an Update Request Processor Chain called `add-unknown-fields-to-the-schema`.
 
- The below steps walk you through learning how to start up Solr, setup the films collection yourself, and then load data.  You can also run `bin/solr start -e films` or `bin/solr start -c -e films` for SolrCloud version.
+ The below steps walk you through learning how to start up Solr, setup the films collection yourself, and then load data.  We'll then create ParamSets to organize our query parameters.
+
+ You can also run `bin/solr start -example films` or `bin/solr start -c -example films` for SolrCloud version which does all the below steps for you.
 
  Steps:
    * Start Solr:
@@ -57,6 +59,8 @@ This data consists of the following fields:
                   example/films/films.csv \
                   -params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
      ```
+
+
    * Let's get searching!
      - Search for 'Batman':
 
@@ -75,42 +79,60 @@ This data consists of the following fields:
 
        http://localhost:8983/solr/films/query?q=*:*&facet=true&facet.field=genre
 
+   * Time for relevancy tuning with ParamSets :
+
+     - Search for 'harry potter':
+
+       http://localhost:8983/solr/films/query?q=name:harry%20potter
+
+       * Notice the very first result is the move _Dumb &amp; Dumberer: When Harry Met Lloyd_?
+       That is clearly not what we are looking for.  
+
+     - Let's set up two relevancy algorithms, and then compare the quality of the results.
+         Algorithm *A* will use the `dismax` and a `qf` parameters, and then Algorithm *B*
+         will use `dismax`, `qf` and a must match `mm` of 100%.
+
+         ```
+         curl http://localhost:8983/solr/films/config/params -X POST -H 'Content-type:application/json' --data-binary '{
+           "set": {
+              "algo_a":{
+                "defType":"dismax",
+                "qf":"name"
+              }
+            },
+            "set": {
+              "algo_b":{
+                "defType":"dismax",
+                "qf":"name",
+                "mm":"100%"
+              }
+             }            
+         }'
+         ```
+
+     - Search for 'harry potter' with Algorithm *A*:
+
+       http://localhost:8983/solr/films/query?q=harry%20potter&useParams=algo_a
+
+       * Now we are returning the five results, including the Harry Potter movies, however notice that we still have the
+         _Dumb &amp; Dumberer: When Harry Met Lloyd_ movie coming back?   
+
+     - Search for 'harry potter' with Algorithm *B*:
+
+       http://localhost:8983/solr/films/query?q=harry%20potter&useParams=algo_b
+
+       * We are now returning only the four Harry Potter movies, leading to more precise results!
+           We can say that we believe Algorithm *B* is better then Algorithm *A*.  You can extend
+           this to online A/B testing very easily to confirm with real users.
+
+
+
+
+
 FAQ:
   Why override the schema of the _name_ and _initial_release_date_ fields?
 
      Without overriding those field types, the _name_ field would have been guessed as a multi-valued string field type
-     and _initial_release_date_ would have been guessed as a multi-valued pdate type.  It makes more sense with this
+     and _initial_release_date_ would have been guessed as a multi-valued `pdate` type.  It makes more sense with this
      particular data set domain to have the movie name be a single valued general full-text searchable field,
      and for the release date also to be single valued.
-
-  How do I clear and reset my environment?
-
-      See the script below.
-
-  Is there an easy to copy/paste script to do all of the above?
-
-```
-    Here ya go << END_OF_SCRIPT
-
-bin/solr stop
-rm server/logs/*.log
-rm -Rf server/solr/films/
-bin/solr start
-bin/solr create -c films
-curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
-    "add-field" : {
-        "name":"name",
-        "type":"text_general",
-        "multiValued":false,
-        "stored":true
-    },
-    "add-field" : {
-        "name":"initial_release_date",
-        "type":"pdate",
-        "stored":true
-    }
-}'
-bin/post -c films example/films/films.json
-
-# END_OF_SCRIPT
-```