You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by eh...@apache.org on 2015/01/09 23:40:14 UTC

svn commit: r1650689 [1/3] - in /lucene/dev/branches/branch_5x: ./ solr/ solr/example/ solr/example/films/README.txt solr/example/films/film_data_generator.py solr/example/films/films.csv solr/example/films/films.json solr/example/films/films.xml

Author: ehatcher
Date: Fri Jan  9 22:40:14 2015
New Revision: 1650689

URL: http://svn.apache.org/r1650689
Log:
SOLR-6127: More improvements to the films example: remove fake document, README steps polished (merged from trunk r1650688)

Modified:
    lucene/dev/branches/branch_5x/   (props changed)
    lucene/dev/branches/branch_5x/solr/   (props changed)
    lucene/dev/branches/branch_5x/solr/example/   (props changed)
    lucene/dev/branches/branch_5x/solr/example/films/README.txt
    lucene/dev/branches/branch_5x/solr/example/films/film_data_generator.py
    lucene/dev/branches/branch_5x/solr/example/films/films.csv
    lucene/dev/branches/branch_5x/solr/example/films/films.json
    lucene/dev/branches/branch_5x/solr/example/films/films.xml

Modified: lucene/dev/branches/branch_5x/solr/example/films/README.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/solr/example/films/README.txt?rev=1650689&r1=1650688&r2=1650689&view=diff
==============================================================================
--- lucene/dev/branches/branch_5x/solr/example/films/README.txt (original)
+++ lucene/dev/branches/branch_5x/solr/example/films/README.txt Fri Jan  9 22:40:14 2015
@@ -3,7 +3,7 @@ All 3 formats contain the same data.  Yo
 
 The data is fetched from Freebase and the data license is present in the films-LICENSE.txt file.
 
-This data consists of the following fields -
+This data consists of the following fields:
  * "id" - unique identifier for the movie
  * "name" - Name of the movie
  * "directed_by" - The person(s) who directed the making of the film
@@ -14,43 +14,45 @@ This data consists of the following fiel
    * Start Solr:
        bin/solr start
 
-   * Create a "films" core
+   * Create a "films" core:
        bin/solr create_core -n films -c data_driven_schema_configs
 
-   * Set the schema on a couple of fields that Solr would otherwise guess differently about:
-curl http://localhost:8983/solr/films/schema/fields -X POST -H 'Content-type:application/json' --data-binary '
-[
-    {
+   * Set the schema on a couple of fields that Solr would otherwise guess differently (than we'd like) about:
+curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
+    "add-field" : {
         "name":"name",
         "type":"text_general",
         "stored":true
     },
-    {
+    "add-field" : {
         "name":"initial_release_date",
         "type":"tdate",
         "stored":true
     }
-]'
+}'
 
    * Now let's index the data, using one of these three commands:
 
      - JSON: bin/post films example/films/films.json
      - XML: bin/post films example/films/films.xml
-     - CSV: bin/post films example/films/films.csv "params=f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
+     - CSV: bin/post \
+                  films \
+                  example/films/films.csv \
+                  "params=f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
 
-   * Let's get searching.
+   * Let's get searching!
      - Search for 'Batman':
        http://localhost:8983/solr/films/query?q=name:batman
 
        * If you get an error about the name field not existing, you haven't yet indexed the data
        * If you don't get an error, but zero results, chances are that the _name_ field schema type override wasn't set
-         before indexing the data the first time.  It's easiest to simply reset the environment and try again, ensuring
-         that each step successfully executes.
+         before indexing the data the first time (it ended up as a "string" type, requiring exact matching by case even).
+         It's easiest to simply reset the environment and try again, ensuring that each step successfully executes.
 
      - Show me all 'Super hero' movies:
        http://localhost:8983/solr/films/query?q=*:*&fq=genre:%22Superhero%20movie%22
 
-     - Let's see the distribution of genres across all the movies. See the facet section for the counts:
+     - Let's see the distribution of genres across all the movies. See the facet section of the response for the counts:
        http://localhost:8983/solr/films/query?q=*:*&facet=true&facet.field=genre
 
      - Browse the indexed films in a traditional browser search interface:
@@ -59,6 +61,18 @@ curl http://localhost:8983/solr/films/sc
        Now browse including the genre field as a facet:
        http://localhost:8983/solr/films/browse?facet.field=genre
 
+       If you want to set a facet for /browse to keep around for every request add the facet.field into the "facets"
+       param set (which the /browse handler is already configured to use):
+curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json'  -d '{
+"update" : {
+  "facets": {
+    "facet.field":"genre"
+    }
+  }
+}'
+
+        And now http://localhost:8983/solr/films/browse will display the _genre_ facet automatically.
+
 Exploring the data further - 
 
   * Increase the MAX_ITERATIONS value, put in your freebase API_KEY and run the film_data_generator.py script using Python 3.
@@ -68,11 +82,55 @@ FAQ:
   Why override the schema of the _name_ and _initial_release_date_ fields?
 
      Without overriding those field types, the _name_ field would have been guessed as a multi-valued string field type
-     and _initial_release_date_ would have been guessed as a multi-valued tdate type.  It makes more sense in our application
-     to have the movie name be a single valued general full-text searchable field, and for the release date also to be single valued.
+     and _initial_release_date_ would have been guessed as a multi-valued tdate type.  It makes more sense with this
+     particular data set domain to have the movie name be a single valued general full-text searchable field,
+     and for the release date also to be single valued.
 
   How do I clear and reset my environment?
 
-     bin/solr stop
-     rm -Rf server/solr/films/
-     # then start from the beginning of the instructions to start fresh
\ No newline at end of file
+      See the script below.
+
+  Is there an easy to copy/paste script to do all of the above?
+
+    Here ya go << END_OF_SCRIPT
+
+bin/solr stop
+rm server/logs/*.log
+rm -Rf server/solr/films/
+bin/solr start
+bin/solr create_core -n films -c data_driven_schema_configs
+curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
+    "add-field" : {
+        "name":"name",
+        "type":"text_general",
+        "stored":true
+    },
+    "add-field" : {
+        "name":"initial_release_date",
+        "type":"tdate",
+        "stored":true
+    }
+}'
+bin/post films example/films/films.json
+curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json'  -d '{
+"update" : {
+  "facets": {
+    "facet.field":"genre"
+    }
+  }
+}'
+
+# END_OF_SCRIPT
+
+Additional fun -
+
+Add highlighting:
+curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json'  -d '{
+"set" : {
+  "browse": {
+    "hl":"on",
+    "hl.fl":"name"
+    }
+  }
+}'
+try http://localhost:8983/solr/films/browse?q=batman now, and you'll see "batman" highlighted in the results

Modified: lucene/dev/branches/branch_5x/solr/example/films/film_data_generator.py
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/solr/example/films/film_data_generator.py?rev=1650689&r1=1650688&r2=1650689&view=diff
==============================================================================
--- lucene/dev/branches/branch_5x/solr/example/films/film_data_generator.py (original)
+++ lucene/dev/branches/branch_5x/solr/example/films/film_data_generator.py Fri Jan  9 22:40:14 2015
@@ -104,10 +104,6 @@ def do_query(filmlist, cursor=""):
 
 if __name__ == "__main__":
   filmlist = []
-  #Adding 1 entry manually to play nice with schemaless mode
-  firstFilm = {'directed_by': ['Wes Anderson'], 'initial_release_date': '2014-03-28', 'genre': ['Comedy'],
-   'name': 'The Grand Budapest Hotel', 'id': '/en/001'}
-  filmlist.append(firstFilm)
   cursor = do_query(filmlist)
   i=0
   while(cursor):

Modified: lucene/dev/branches/branch_5x/solr/example/films/films.csv
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/solr/example/films/films.csv?rev=1650689&r1=1650688&r2=1650689&view=diff
==============================================================================
--- lucene/dev/branches/branch_5x/solr/example/films/films.csv (original)
+++ lucene/dev/branches/branch_5x/solr/example/films/films.csv Fri Jan  9 22:40:14 2015
@@ -1,5 +1,4 @@
 name,directed_by,genre,type,id,initial_release_date
-The Grand Budapest Hotel,Wes Anderson,Comedy,,/en/001,2014-03-28
 .45,Gary Lennon,Black comedy|Thriller|Psychological thriller|Indie film|Action Film|Crime Thriller|Crime Fiction|Drama,,/en/45_2006,2006-11-30
 9,Shane Acker,Computer Animation|Animation|Apocalyptic and post-apocalyptic fiction|Science Fiction|Short Film|Thriller|Fantasy,,/en/9_2005,2005-04-21
 69,Lee Sang-il,Japanese Movies|Drama,,/en/69_2004,2004-07-10
@@ -100,7 +99,7 @@ Essential Keys To Better Bowling,,Docume
 Adventures Into Digital Comics,Sébastien Dumesnil,Documentary film,,/en/adventures_into_digital_comics,
 Ae Fond Kiss...,Ken Loach,Romance Film|Drama,,/en/ae_fond_kiss,2004-02-13
 Aetbaar,Vikram Bhatt,Thriller|Romance Film|Mystery|Horror|Musical|Bollywood|World cinema|Drama|Musical Drama,,/en/aetbaar,2004-01-23
-Aethiree,K. S. Ravikumar,Comedy|Tamil cinema|World cinema,,/en/aethiree,2004-04-23
+Aethirree,K. S. Ravikumar,Comedy|Tamil cinema|World cinema,,/en/aethiree,2004-04-23
 After Innocence,Jessica Sanders,Documentary film|Crime Fiction|Political cinema|Culture &amp; Society|Law &amp; Crime|Biographical film,,/en/after_innocence,
 After the Sunset,Brett Ratner,Crime Fiction|Action/Adventure|Action Film|Crime Thriller|Heist film|Caper story|Crime Comedy|Comedy,,/en/after_the_sunset,2004-11-10
 Aftermath,Thomas Farone,Crime Fiction|Thriller,,/en/aftermath_2007,2013-03-01