You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by eh...@apache.org on 2014/11/10 02:01:02 UTC

svn commit: r1637767 - /lucene/cms/branches/solr_6058/content/solr/quickstart.mdtext

Author: ehatcher
Date: Mon Nov 10 01:01:01 2014
New Revision: 1637767

URL: http://svn.apache.org/r1637767
Log:
site quick start, ready to roll (but plenty more work to flesh it out further in the future)

Modified:
    lucene/cms/branches/solr_6058/content/solr/quickstart.mdtext

Modified: lucene/cms/branches/solr_6058/content/solr/quickstart.mdtext
URL: http://svn.apache.org/viewvc/lucene/cms/branches/solr_6058/content/solr/quickstart.mdtext?rev=1637767&r1=1637766&r2=1637767&view=diff
==============================================================================
--- lucene/cms/branches/solr_6058/content/solr/quickstart.mdtext (original)
+++ lucene/cms/branches/solr_6058/content/solr/quickstart.mdtext Mon Nov 10 01:01:01 2014
@@ -5,18 +5,12 @@ Title: Quick Start
   <li><a href="/solr/resources.html">Resources</a></li>
 </ul>
 
-# Solr<sup>&trade;</sup> Quick Start
+# Solr Quick Start
 
 ***
 
 ## Overview
 
-<!--
-  TODO: Where to mention (or not?) the Solr version number this is for?   It's intentionally embedded in the examples below, at least.
-
-  4.10.2 was used to write this quick start guide
--->
-
 This document covers getting Solr up and running, ingesting a variety of data sources into multiple collections, and getting a feel
 for the Solr administrative and search interfaces.
 
@@ -99,9 +93,7 @@ Or if you prefer, you can make every jav
 
 ### Indexing a directory of "rich" files
 
-Let's first index local "rich" files (HTML, PDF, text, and many other supported formats).  `SimplePostTool` features the ability to crawl a directory
-of files, optionally recursively even, sending the raw content of each file into Solr for extraction and indexing.   A Solr install includes a docs/
-subdirectory, so that makes a convenient set of (mostly) HTML files built-in to start with.
+Let's first index local "rich" files including HTML, PDF, Microsoft Office formats (such as MS Word), plain text and many other formats.  `SimplePostTool` features the ability to crawl a directory of files, optionally recursively even, sending the raw content of each file into Solr for extraction and indexing.   A Solr install includes a docs/ subdirectory, so that makes a convenient set of (mostly) HTML files built-in to start with.
 
     java -Dauto -Drecursive org.apache.solr.util.SimplePostTool docs/
 
@@ -166,7 +158,7 @@ You can index all of the sample data, us
     COMMITting Solr index changes to http://localhost:8983/solr/update..
     Time spent: 0:00:00.453
 
-...and now you can search for all sorts of things using the default [Solr Query Syntax]() (a superset of the Lucene query syntax)...
+...and now you can search for all sorts of things using the default [Solr Query Syntax](https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-SpecifyingTermsfortheStandardQueryParser) (a superset of the Lucene query syntax)...
 
 
 NOTE:
@@ -178,14 +170,14 @@ Your own data may not look ideal at firs
 
 ### Indexing JSON
 
-Solr supports indexing JSON, either arbitrary structured JSON or "Solr JSON" format which is similiar to Solr XML.  
+Solr supports indexing JSON, either arbitrary structured JSON or "Solr JSON" (which is similiar to Solr XML).
 
 Solr includes a small sample Solr JSON file to illustrate this capability.  Again using `SimplePostTool`, index the sample JSON file:
 
     /solr-4.10.2:$ java -Dauto org.apache.solr.util.SimplePostTool example/exampledocs/books.json
     SimplePostTool version 1.5
     Posting files to base url http://localhost:8983/solr/update..
-    Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
+    Entering auto mode. File endings considered are xml,json,csv,...
     POSTing file books.json (application/json)
     1 files indexed.
     COMMITting Solr index changes to http://localhost:8983/solr/update..
@@ -193,37 +185,48 @@ Solr includes a small sample Solr JSON f
 
 Because the SimplePostTool defaults to assuming files are in Solr XML format, the `-Dauto` switch is used to post JSON files so that it uses the appropriate content type.
 
+To flatten and index arbitrary structured JSON, a topic beyond this quick start guide, check out [how to transform and flatten JSON](https://issues.apache.org/jira/browse/SOLR-6304).
+
 ### Indexing CSV (Comma/Column Separated Values)
 
+A great conduit of data into Solr is via CSV, especially when the documents are homogonenous and generally all have the same set of fields.  CSV can be conveniently exported from a spreadsheet such as Excel, or exported from databases such as MySQL.  When getting started with Solr, it can often be easiest to get your structured data into CSV format and then index that into Solr rather than a more sophisticated single step operation.
 
+Using SimplePostTool and the included example CSV data file, index it:
 
-* Import records from a database using the [Data Import Handler (DIH)]().
-    
-* [Load a CSV file]() (comma separated values), including those exported by Excel or MySQL.
+    /solr-4.10.2:$ java -Dauto org.apache.solr.util.SimplePostTool example/exampledocs/books.csv
+    SimplePostTool version 1.5
+    Posting files to base url http://localhost:8983/solr/update..
+    Entering auto mode. File endings considered are xml,json,csv,...
+    POSTing file books.csv (text/csv)
+    1 files indexed.
+    COMMITting Solr index changes to http://localhost:8983/solr/update..
+    Time spent: 0:00:00.084
 
-* Index binary documents such as Word and PDF with [Solr Cell]() (ExtractingRequestHandler).
+### Other indexing techniques
 
-* Use [SolrJ]() for Java or other Solr clients to programatically create documents to send to Solr.
+* Import records from a database using the [Data Import Handler (DIH)](https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler).
+    
+* Use [SolrJ](https://cwiki.apache.org/confluence/display/solr/Using+SolrJ) for Java or other Solr clients to programatically create documents to send to Solr.
 
 ***
 
 ## Updating Data
 
-You may have noticed that even though the file `solr.xml` has now been POSTed to the server twice, you still only get 1 result when searching for "solr". This is because the example `schema.xml` specifies a "`uniqueKey`" field called "id". Whenever you POST commands to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it for you. You can see that that has happened by looking at the values for numDocs and maxDoc in the "CORE"/searcher section of the statistics page...
+You may notice that even if you index content in this guide more than once, it does not duplicate the results found. This is because the example `schema.xml` specifies a "`uniqueKey`" field called "id". Whenever you POST commands to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it for you. You can see that that has happened by looking at the values for numDocs and maxDoc in the "CORE"/searcher section of the statistics page...
 
 <http://localhost:8983/solr/#/collection1/plugins/core?entry=searcher>
 
-numDocs represents the number of searchable documents in the index (and will be larger than the number of XML files since some files contained more than one <doc>). maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index. You can re-post the sample XML files over and over again as much as you want and numDocs will never increase, because the new documents will constantly be replacing the old.
+numDocs represents the number of searchable documents in the index (and will be larger than the number of XML, JSON, or CSV files since some files contained more than one document).  The maxDoc value may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index. You can re-post the sample files over and over again as much as you want and numDocs will never increase, because the new documents will constantly be replacing the old.
 
-Go ahead and edit the existing XML files to change some of the data, and re-run the java -jar post.jar command, you'll see your changes reflected in subsequent searches.
+Go ahead and edit any of the existing example data files, change some of the data, and re-run the SimplePostTool command.  You'll see your changes reflected in subsequent searches.
 
 ## Deleting Data
 
-You can delete data by POSTing a delete command to the update URL and specifying the value of the document's unique key field, or a query that matches multiple documents (be careful with that one!). Since these commands are smaller, we will specify them right on the command line rather than reference an XML file.
+You can delete data by POSTing a delete command to the update URL and specifying the value of the document's unique key field, or a query that matches multiple documents (be careful with that one!). Since these commands are smaller, we specify them right on the command line rather than reference a JSON or XML file.
 
 Execute the following command to delete a specific document
 
-    java -Ddata=args -Dcommit=false -jar post.jar "<delete><id>SP2514N</id></delete>"
+    java -Ddata=args -jar post.jar "<delete><id>SP2514N</id></delete>"
 
 ***
 
@@ -238,22 +241,38 @@ Execute the following command to delete 
 
 ## Wrapping up
 
-Cleanup:
-   bin/solr stop -all ; rm -Rf node1/ node2/ 
+If you've run the full set of commands in this quick start guide you have done the following:
 
+* Launched Solr into SolrCloud mode, two nodes, two collections including shards and replicas
+* Indexed a directory of rich text files
+* Indexed Solr XML files
+* Indexed Solr JSON files
+* Indexed CSV content
+* Opened the admin console, used its query interface to get JSON formatted results
+* Opened the /browse interface to explore Solr's features in a more friendly and familiar interface
 
-Full script and then console output:
+Nice work!   The script (see below) to run all of these items took under one and half minutes! (your run time may vary, depending on your computers power and resources available)
 
-export CLASSPATH=dist/solr-core-4.10.2.jar
-date ;
-bin/solr start -e cloud -noprompt ; 
-   open http://localhost:8983/solr ;
-   java -Dauto -Drecursive org.apache.solr.util.SimplePostTool docs/ ; 
-   java org.apache.solr.util.SimplePostTool example/exampledocs/*.xml ;
-   open http://localhost:8983/solr/collection1/browse ;
-date ;
+Here's a full Unix script for convenient copying and pasting in order to run all of the commands for this quick start guide:
+
+    export CLASSPATH=dist/solr-core-4.10.2.jar
+    date ;
+    bin/solr start -e cloud -noprompt ; 
+       open http://localhost:8983/solr ;
+       java -Dauto -Drecursive org.apache.solr.util.SimplePostTool docs/ ; 
+       open http://localhost:8983/solr/collection1/browse ;
+       java org.apache.solr.util.SimplePostTool example/exampledocs/*.xml ;
+       java -Dauto org.apache.solr.util.SimplePostTool example/exampledocs/books.json ;
+       java -Ddata=args org.apache.solr.util.SimplePostTool "<delete><id>SP2514N</id></delete>" ;
+    date ;
 
 
+### Cleanup:
+
+As you work through this guide, you may want to stop Solr and reset the environment back to the starting point.  The following command line will stop Solr and remove the directories for each of the two nodes that the start script created:
+
+   bin/solr stop -all ; rm -Rf node1/ node2/ 
+