You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@solr.apache.org by ep...@apache.org on 2021/12/15 19:50:33 UTC
[solr] branch main updated: SOLR-15834: Films example readme needs updating and including useParams mention (#444)
This is an automated email from the ASF dual-hosted git repository.
epugh pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/solr.git
The following commit(s) were added to refs/heads/main by this push:
new 64e8917 SOLR-15834: Films example readme needs updating and including useParams mention (#444)
64e8917 is described below
commit 64e891707f8d9d399f67efd8536e72df8034f89c
Author: Eric Pugh <ep...@opensourceconnections.com>
AuthorDate: Wed Dec 15 14:50:25 2021 -0500
SOLR-15834: Films example readme needs updating and including useParams mention (#444)
Update the Readme and the SolrCLI.java command to demonstrate adding a films collection with two ParamSets and picking between them with the useParams query parameter.
---
solr/CHANGES.txt | 2 +
.../src/java/org/apache/solr/util/SolrCLI.java | 25 ++++-
solr/example/films/README.md | 108 +++++++++++++--------
3 files changed, 90 insertions(+), 45 deletions(-)
diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt
index 36cbe7a..1da35bd 100644
--- a/solr/CHANGES.txt
+++ b/solr/CHANGES.txt
@@ -181,6 +181,8 @@ when told to. The admin UI now tells it to. (Nazerke Seidan, David Smiley)
* SOLR-15786: Add the "films" example to SolrCLI via -e films parameter. (Eric Pugh)
+* SOLR-15834: Films example readme needs updating, including useParams support for multiple algorithms. (Eric Pugh)
+
Build
---------------------
diff --git a/solr/core/src/java/org/apache/solr/util/SolrCLI.java b/solr/core/src/java/org/apache/solr/util/SolrCLI.java
index 5128501..3f5d36b 100755
--- a/solr/core/src/java/org/apache/solr/util/SolrCLI.java
+++ b/solr/core/src/java/org/apache/solr/util/SolrCLI.java
@@ -2860,9 +2860,9 @@ public class SolrCLI implements CLIO {
}
}
else if ("films".equals(exampleName) && !alreadyExists) {
- echo("Adding name and initial_release_data fields to films schema \"_default\"");
-
HttpSolrClient solrClient = new HttpSolrClient.Builder(solrUrl).build();
+
+ echo("Adding name and initial_release_data fields to films schema \"_default\"");
try {
SolrCLI.postJsonToSolr(solrClient, "/" + collectionName + "/schema", "{\n" +
" \"add-field\" : {\n" +
@@ -2881,6 +2881,27 @@ public class SolrCLI implements CLIO {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, ex);
}
+ echo("Adding paramsets \"algo\" and \"algo_b\" to films configuration for relevancy tuning");
+ try {
+ SolrCLI.postJsonToSolr(solrClient, "/" + collectionName + "/config/params", "{\n" +
+ " \"set\": {\n" +
+ " \"algo_a\":{\n" +
+ " \"defType\":\"dismax\",\n" +
+ " \"qf\":\"name\"\n" +
+ " }\n" +
+ " },\n" +
+ " \"set\": {\n" +
+ " \"algo_b\":{\n" +
+ " \"defType\":\"dismax\",\n" +
+ " \"qf\":\"name\",\n" +
+ " \"mm\":\"100%\"\n" +
+ " }\n" +
+ " }\n" +
+ " }\n");
+ } catch (Exception ex) {
+ throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, ex);
+ }
+
File filmsJsonFile = new File(exampleDir, "films/films.json");
String updateUrl = String.format(Locale.ROOT, "%s/%s/update/json", solrUrl, collectionName);
echo("Indexing films example docs from " + filmsJsonFile.getAbsolutePath());
diff --git a/solr/example/films/README.md b/solr/example/films/README.md
index bd4bf82..2ed1713 100644
--- a/solr/example/films/README.md
+++ b/solr/example/films/README.md
@@ -1,20 +1,22 @@
We have a movie data set in JSON, Solr XML, and CSV formats. All 3 formats contain the same data. You can use any one format to index documents to Solr.
-This example uses the _default configset that ships with Solr plus some custom fields added via Schema API.
+This example uses the `_default` configset that ships with Solr plus some custom fields added via Schema API. It demonstrates the use of ParamSets in conjunction with the [Request Parameters API](https://solr.apache.org/guide/request-parameters-api.html).
-The data is was fetched from Freebase and the data license is present in the films-LICENSE.txt file. Freebase was shutdown in 2016 by Google.
+The original data was fetched from Freebase and the data license is present in the films-LICENSE.txt file. Freebase was shutdown in 2016 by Google.
This data consists of the following fields:
- * "id" - unique identifier for the movie
- * "name" - Name of the movie
- * "directed_by" - The person(s) who directed the making of the film
- * "initial_release_date" - The earliest official initial film screening date in any country
- * "genre" - The genre(s) that the movie belongs to
+ * `id` - unique identifier for the movie
+ * `name` - Name of the movie
+ * `directed_by` - The person(s) who directed the making of the film
+ * `initial_release_date` - The earliest official initial film screening date in any country
+ * `genre` - The genre(s) that the movie belongs to
- The "name" and "initial_release_date" are created via the Schema API, and the "genre" and "direct_by" fields
- are created by the use of an Update Request Processor Change called "add-unknown-fields-to-the-schema".
+ The `name` and `initial_release_date` are created via the Schema API, and the `genre` and `direct_by` fields
+ are created by the use of an Update Request Processor Chain called `add-unknown-fields-to-the-schema`.
- The below steps walk you through learning how to start up Solr, setup the films collection yourself, and then load data. You can also run `bin/solr start -e films` or `bin/solr start -c -e films` for SolrCloud version.
+ The below steps walk you through learning how to start up Solr, setup the films collection yourself, and then load data. We'll then create ParamSets to organize our query parameters.
+
+ You can also run `bin/solr start -example films` or `bin/solr start -c -example films` for SolrCloud version which does all the below steps for you.
Steps:
* Start Solr:
@@ -57,6 +59,8 @@ This data consists of the following fields:
example/films/films.csv \
-params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
```
+
+
* Let's get searching!
- Search for 'Batman':
@@ -75,42 +79,60 @@ This data consists of the following fields:
http://localhost:8983/solr/films/query?q=*:*&facet=true&facet.field=genre
+ * Time for relevancy tuning with ParamSets :
+
+ - Search for 'harry potter':
+
+ http://localhost:8983/solr/films/query?q=name:harry%20potter
+
+ * Notice the very first result is the move _Dumb & Dumberer: When Harry Met Lloyd_?
+ That is clearly not what we are looking for.
+
+ - Let's set up two relevancy algorithms, and then compare the quality of the results.
+ Algorithm *A* will use the `dismax` and a `qf` parameters, and then Algorithm *B*
+ will use `dismax`, `qf` and a must match `mm` of 100%.
+
+ ```
+ curl http://localhost:8983/solr/films/config/params -X POST -H 'Content-type:application/json' --data-binary '{
+ "set": {
+ "algo_a":{
+ "defType":"dismax",
+ "qf":"name"
+ }
+ },
+ "set": {
+ "algo_b":{
+ "defType":"dismax",
+ "qf":"name",
+ "mm":"100%"
+ }
+ }
+ }'
+ ```
+
+ - Search for 'harry potter' with Algorithm *A*:
+
+ http://localhost:8983/solr/films/query?q=harry%20potter&useParams=algo_a
+
+ * Now we are returning the five results, including the Harry Potter movies, however notice that we still have the
+ _Dumb & Dumberer: When Harry Met Lloyd_ movie coming back?
+
+ - Search for 'harry potter' with Algorithm *B*:
+
+ http://localhost:8983/solr/films/query?q=harry%20potter&useParams=algo_b
+
+ * We are now returning only the four Harry Potter movies, leading to more precise results!
+ We can say that we believe Algorithm *B* is better then Algorithm *A*. You can extend
+ this to online A/B testing very easily to confirm with real users.
+
+
+
+
+
FAQ:
Why override the schema of the _name_ and _initial_release_date_ fields?
Without overriding those field types, the _name_ field would have been guessed as a multi-valued string field type
- and _initial_release_date_ would have been guessed as a multi-valued pdate type. It makes more sense with this
+ and _initial_release_date_ would have been guessed as a multi-valued `pdate` type. It makes more sense with this
particular data set domain to have the movie name be a single valued general full-text searchable field,
and for the release date also to be single valued.
-
- How do I clear and reset my environment?
-
- See the script below.
-
- Is there an easy to copy/paste script to do all of the above?
-
-```
- Here ya go << END_OF_SCRIPT
-
-bin/solr stop
-rm server/logs/*.log
-rm -Rf server/solr/films/
-bin/solr start
-bin/solr create -c films
-curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
- "add-field" : {
- "name":"name",
- "type":"text_general",
- "multiValued":false,
- "stored":true
- },
- "add-field" : {
- "name":"initial_release_date",
- "type":"pdate",
- "stored":true
- }
-}'
-bin/post -c films example/films/films.json
-
-# END_OF_SCRIPT
-```