You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2011/07/17 16:06:56 UTC

[Solr Wiki] Update of "FAQ" by Gabriele

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "FAQ" page has been changed by Gabriele:
http://wiki.apache.org/solr/FAQ?action=diff&rev1=77&rev2=78

  <<TableOfContents>>
  
  = General =
- 
  == What is Solr? ==
- 
  Solr is a stand alone enterprise search server which applications communicate with using XML and HTTP to index documents, or execute searches.  Solr supports a rich schema specification that allows for a wide range of flexibility in dealing with different document fields, and has an extensive search plugin API for developing custom search behavior.
  
  For more information please read this [[http://lucene.apache.org/solr/features.html|overview of Solr features]].
  
  == Are there Mailing lists for Solr? ==
- 
- Yes there are several
- [[http://lucene.apache.org/solr/mailing_lists.html|Solr email lists]].
+ Yes there are several [[http://lucene.apache.org/solr/mailing_lists.html|Solr email lists]].
  
+ Here are some guidelines for effectively using the email lists [[UsingMailingLists|Getting the most out of the email lists]].
- Here are some guidelines for effectively using the email lists
- [[UsingMailingLists|Getting the most out of the email lists]].
  
  == How do you pronounce Solr? ==
- 
  It's pronounced the same as you would pronounce "Solar".
  
  == What does Solr stand for? ==
- 
  Solr is not an acronym.
  
  == Where did Solr come from? ==
- 
- "Solar" (with an A) was initially developed by [[http://cnetnetworks.com|CNET Networks]] as an in-house search platform beginning in late fall 2004.  By summer 2005, CNET's product catalog was powered by Solar, and several other CNET applications soon followed.  In January 2006 CNET [[http://issues.apache.org/jira/browse/SOLR-1|Granted the existing code base to the ASF]] to become the "Solr" project.  On January 17, 2007 Solr [[http://mail-archives.apache.org/mod_mbox/lucene-general/200701.mbox/%3Cc68e39170701170707q3945a14aj5923acb0d3e1f963@mail.gmail.com%3E|graduated from the Apache Incubator]] to become a Lucene subproject.
+ "Solar" (with an A) was initially developed by [[http://cnetnetworks.com|CNET Networks]] as an in-house search platform beginning in late fall 2004.  By summer 2005, CNET's product catalog was powered by Solar, and several other CNET applications soon followed.  In January 2006 CNET [[http://issues.apache.org/jira/browse/SOLR-1|Granted the existing code base to the ASF]] to become the "Solr" project.  On January 17, 2007 Solr [[http://mail-archives.apache.org/mod_mbox/lucene-general/200701.mbox/<c6...@mail.gmail.com>|graduated from the Apache Incubator]] to become a Lucene subproject. In March 2010, The Solr and Lucene-java subprojects merged into a single project.
- In March 2010, The Solr and Lucene-java subprojects merged into a single project.
  
  == Is Solr Stable? Is it "Production Quality?" ==
- 
  Solr is currently being used to power search applications on several [[PublicServers|high traffic publicly accessible websites]].
  
  == Is Solr Schema-less ==
- 
- Yes, in the ways that count.  Solr does have a schema to define types, but it's a "free" schema in that
- you don't have to define all of your fields ahead of time. Using {{{<dynamicField />}}} declarations, you can configure field types based on field naming convention, and each document you index can have a different set of fields.
+ Yes, in the ways that count.  Solr does have a schema to define types, but it's a "free" schema in that you don't have to define all of your fields ahead of time. Using {{{<dynamicField />}}} declarations, you can configure field types based on field naming convention, and each document you index can have a different set of fields.
  
  = Using =
- 
  == Do my applications have to be written in Java to use Solr? ==
- 
  No.
  
- Solr itself is a Java Application, but all interaction with Solr is done by POSTing messages over HTTP (in JSON, XML, CSV, or binary formats) to index documents and GETing search results back as JSON, XML, or a variety of other formats (Python, Ruby, PHP, CSV, binary, etc...) 
+ Solr itself is a Java Application, but all interaction with Solr is done by POSTing messages over HTTP (in JSON, XML, CSV, or binary formats) to index documents and GETing search results back as JSON, XML, or a variety of other formats (Python, Ruby, PHP, CSV, binary, etc...)
  
  == What are the Requirements for running a Solr server? ==
- 
  Solr requires Java 1.5 and an Application server (such as Tomcat) which supports the Servlet 2.4 standard.
  
  == How can I get started playing with Solr? ==
- 
  There is an [[http://lucene.apache.org/solr/tutorial.html|online tutorial]] as well as a [[http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/|demonstration configuration in SVN]].
  
  == Solr Comes with Jetty, is Jetty the recommended Servlet Container to use when running Solr? ==
+ The Solr example app has Jetty in it just because at the time we set it up, Jetty was the simplest/smallest servlet container we found that could be run easily in a cross platform way (ie: "java -jar start.jar").  That does not imply that Solr runs better under Jetty, or that Jetty is only good enough for demos -- it's just that Jetty made our demo setup easier.
- 
- The Solr example app has Jetty in it just because at the time we set it up, Jetty
- was the simplest/smallest servlet container we found that could be run
- easily in a cross platform way (ie: "java -jar start.jar").  That does not imply
- that Solr runs better under Jetty, or that Jetty is only good enough for demos --
- it's just that Jetty made our demo setup easier.
  
  Users should decide for themselves which Servlet Container they consider the easiest/best for their use cases based on their needs/experience. For high traffic scenarios, investing time for tuning the servlet container can often make a big difference.
  
  == How do I change the logging levels/files/format ? ==
- 
  See SolrLogging
  
  == I POSTed some documents, why don't they show up when I search? ==
- 
  Documents that have been added to the index don't show up in search results until a commit is done (one way is to POST a <commit/> message to the XML update handler). This allows you to POST many documents in succession and know that none of them will be visible to search clients until you have finished.
  
  == How can I delete all documents from my index? ==
- 
  Use the "match all docs" query in a delete by query command: {{{<delete><query>*:*</query></delete>}}}
  
  This has been optimized to be more efficient then deleting by some arbitrary query which matches all docs because of the nature of the data.
  
  == How can I rebuild my index from scratch if I change my schema? ==
- 
   1. Use the "match all docs" query in a delete by query command before shutting down Solr: {{{<delete><query>*:*</query></delete>}}}
   1. Stop your server
   1. Change your schema.xml
@@ -93, +68 @@

  One can also delete all documents, change the schema.xml file, and then [[CoreAdmin|reload the core]] w/o shutting down Solr.
  
  == How can I update a specific field of an existing document? ==
- 
  I want update a specific field in a document, is that possible? I only need to index one field for a specific document. Do I have to index all the document for this?
  
  No, just the one document. Let's say you have a CMS and you edit one document. You will need to re-index this document only by using the the add solr statement for the whole document (not one field only).
  
- In Lucene to update a document the operation is really a delete followed by an add.  You will need to add the complete document as there is no such "update only a field" semantics in Lucene. 
+ In Lucene to update a document the operation is really a delete followed by an add.  You will need to add the complete document as there is no such "update only a field" semantics in Lucene.
  
  == How do I use copyField with wildcards? ==
- 
  The `<copyField>` directive allows wildcards in the source, so that several fields can be copied into one destination field without having to specify them all individually.  The dest field may by a full field name, or a wildcard expression. A common use case is something like:
  
  {{{
     <copyField source="*_t"  dest="text" />
  }}}
- 
  This tells Solr to copy the contents of any field that ends in "_t" to the "text" field.  This is particularly useful when you have a large, and possibly changing, set of fields you want to index into a single field.  With the example above, you could start indexing fields like "description_t", "editorial_review_t", and so on, and all their content would be indexed in the "text" field.  It's important in this example that the "text" field be defined in schema.xml as multiValued since you intend to copy multiple sources into the single destination.
  
  Note that you can use the wildcard copyField syntax with or without similar dynamicField declarations.  Thus you could choose to index the "description_t", "editorial_review_t" fields individually with a dynamicField like
@@ -115, +87 @@

  {{{
     <dynamicField name="*_t" type="text" indexed="true" stored="false" />
  }}}
- 
  but you don't have to if you don't want to.  You could even mix and match across different dynamic fields by doing something like
  
  {{{
     <dynamicField name="*_i_t" type="text" indexed="true" stored="false" />
     <copyField source="*_t"  dest="text" />
  }}}
- 
  Now, as you add fields, you can give them names ending in "_i_t" if you want them indexed seperately, and stored in the main "text" field, and "_t" without the "_i" if you just want them indexed in "text" but not individually.
  
- 
  == Why does the request time out sometimes when doing commits? ==
- 
  Internally, Solr does nothing to time out any requests -- it lets both updates and queries take however long they need to take to be processed fully.  However, the servlet container being used to run Solr may impose arbitrary timeout limits on all requests.  Please consult the documentation for your Serlvet container if you find that this value is too low.
  
  (In Jetty, the relevant setting is "maxIdleTime" which is in milliseconds)
  
  == Why don't International Characters Work? ==
- 
  Solr can index any characters expressed in the UTF-8 charset (see [[http://issues.apache.org/jira/browse/SOLR-96|SOLR-96]]). There are no known bugs with Solr's character handling, but there have been some reported issues with the way different application servers (and different versions of the same application server) treat incoming and outgoing multibyte characters.  In particular, people have reported better success with Tomcat than with Jetty...
  
   * "[[http://www.nabble.com/International-Charsets-in-embedded-XML-tf1780147.html#a4897795|International Charsets in embedded XML]]" (Jetty 5.1)
@@ -142, +109 @@

  If you notice a problem with multibyte characters, the first step to ensure that it is not a true Solr bug would be to write a unit test that bypasses the application server directly using the [[http://lucene.apache.org/solr/api/org/apache/solr/util/AbstractSolrTestCase.html|AbstractSolrTestCase]].
  
  The most important points are:
+ 
   * The document has to be indexed as UTF-8 encoded on the solr server. For example, if you send an ISO encoded document, then the special ISO characters get a byte added (screwing up the final encoding, only reindexing with UTF-8 can fix this).
-  * The client needs UTF-8 URL encoding when forwarding the search request to the solr server. 
+  * The client needs UTF-8 URL encoding when forwarding the search request to the solr server.
   * The server needs to support UTF-8 query strings. See e.g. [[http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config|Solr with Apache Tomcat]].
  
  If you just forward doing:
- {{{
+ 
- #!java
+ {{{#!java
  String value = request.getParameter("q");
+ }}}
- }}} to get the query string, it can be that q got encoded in ISO and then solr will not return a search result.
+ to get the query string, it can be that q got encoded in ISO and then solr will not return a search result.
  
  One possible solution is:
- {{{
+ 
- #!java
+ {{{#!java
  String encoding = request.getCharacterEncoding();
  if (null == encoding) {
-   // Set your default encoding here 
+   // Set your default encoding here
    request.setCharacterEncoding("UTF-8");
  } else {
    request.setCharacterEncoding(encoding);
@@ -165, +134 @@

  ...
  String value = request.getParameter("q");
  }}}
- 
  Another possibility is to use java.net.URLDecoder/URLEncoder to transform all parameter value to UTF-8.
  
  == Solr started, and i can POST documents to it, but the admin screen doesn't work ==
- 
  The admin screens are implemented using JSPs which require a JDK (instead of just a JRE) to be compiled on the fly.  If you encounter errors trying to load the admin pages, and the stack traces of these errors seem to relate to compilation of JSPs, make sure you have a JDK installed, and make sure it is the instance of java being used.
  
  NOTE: Some Servlet Containers (like Tomcat5.5 and Jetty6) don't require a JDK for JSPs.
@@ -179, +146 @@

  
  Restarting Solr after creating a $(jetty.home)/work directory for Jetty's work files should solve the problem.
  
- This might also be caused by starting two Solr instances on the same port and killing one, see [[http://issues.apache.org/jira/browse/SOLR-118#action_12507990|Hoss's comment]] in SOLR-118. 
+ This might also be caused by starting two Solr instances on the same port and killing one, see [[http://issues.apache.org/jira/browse/SOLR-118#action_12507990|Hoss's comment]] in SOLR-118.
  
  == What does "CorruptIndexException: Unknown format version" mean ? ==
- 
- This happens when the Lucene code in Solr used to read the index files from disk encounters index files in a format it doesn't recognize.  
+ This happens when the Lucene code in Solr used to read the index files from disk encounters index files in a format it doesn't recognize.
  
  The most common cause is from using a version of Solr+Lucene that is older then the version used to create that index.
  
  == What does "exceeded limit of maxWarmingSearchers=X" mean? ==
- 
  Whenever a commit happens in Solr, a new "searcher" (with new caches) is opened, "warmed" up according to your SolrConfigXml settings, and then put in place.  The previous searcher is not closed until the "warming" search is ready.  If multiple commits happen in rapid succession -- before the warming searcher from first commit has enough time to warm up, then there can be multiple searchers all competeing for resources at the same time, even htough one of them will be thrown away as soon as the next one is ready.
  
  maxWarmingSearchers is a setting in SolrConfigXml that helps you put a safety valve on the number of overlapping warming searchers that can exist at one time.  If you see this error it means Solr prevented a commit from resulting an a new searcher being opened because there were already X warming searchers open.
@@ -198, +163 @@

  If you only encounter this error infrequently because of fluke situations, you'll probably be ok just ignoring it.
  
  = Searching =
- 
  == How to make the search use AND semantics by default rather than OR? ==
- 
  In `schema.xml`:
+ 
  {{{
  <solrQueryParser defaultOperator="AND"/>
  }}}
- 
  == How do I add full-text summaries to my search results? ==
- 
  Basic highlighting/summarization can be added adding `hl=true` to the query parameters.  More advanced highlighting is described in HighlightingParameters.
  
  == I have set `hl=true` but no summaries are being output ==
- 
  For a field to be summarizable it must be both stored and indexed.  Note that this can significantly increase the index size for large fields (e.g. the main content field of a document).  Consider storing the field using compression (`compressed=true` in the `schema.xml` `fieldType` definition).  Additionally, such field needs to be tokenized.
  
  == I want to add basic category counts to my search results ==
- 
  Solr provides support for "facets" out-of-the-box.  See SimpleFacetParameters.
  
  == How can I figure out why my documents are being ranked the way they are? ==
- 
  Solr's uses [[http://lucene.apache.org/|Lucene]] for ranking.  A detailed summary of the ranking calculation can be obtained by adding [[CommonQueryParameters#debugQuery|`debugQuery=true`]] to the query parameter list.  The output takes some getting used to if you are not familiar with Lucene's ranking model.
  
  The [[SolrRelevancyFAQ]] has more information on understanding why documents rank the way they do.
  
  == Why Isn't Sorting Working on my Text Fields? ==
- 
  Lucene Sorting requires that the field you want to sort on be indexed, but it cannot contain more than one "token" per document.  Most Analyzers used on Text fields result in more than one token, so the simplest thing to do is to use copyField to index a second version of your field using the !StrField class.
  
  If you need to do some processing on the field value using !TokenFilters, you can also use the !KeywordTokenizer, see the Solr example schema for more information.
@@ -238, +196 @@

  See also the Solr tutorial and the xml.com article about Solr, listed in the SolrResources.
  
  == How can I get ALL the matching documents back? ... How can I return an unlimited number of rows? ==
- 
  This is impractical in most cases.  People typically only want to do this when they know they are dealing with an index whose size guarantees the result sets will be always be small enough that they can feasibly be transmitted in a manageable amount -- but if that's the case just specify what you consider a "manageable amount" as your `rows` param and get the best of both worlds (all the results when your assumption is right, and a sanity cap on the result size if it turns out your assumptions are wrong)
  
  == Can I use Lucene to access the index generated by SOLR? ==
- 
  Yes, although this is not recommended. Writing to the index is particularly tricky. However, if you do go down this route, there are a couple of things to keep in mind. Be careful that the analysis chain you use in Lucene matches the one used to index the data or you'll get surprising results. Also, be aware that if you open a searcher, you won't see changes that Solr makes to the index unless you reopen the underlying readers.
  
+ == Is there a limit on the number of keywords for a Solr query? ==
+ No. If you make a GET query, through [[http://localhost:8080/solr/admin/form.jsp|Solr Web interface]] for example, you are limited to the maximum URL lenght of the browser.
+ 
+ 
+ 
  = Performance =
- 
  == How fast is indexing? ==
- 
  Indexing performance varies considerably depending on the size of the documents, the analysis requirements, and cpu and io performance of the machine.  Rates between `10` and `150` docs/s have been reported.
  
  == How can indexing be accelerated? ==
- 
  A few ideas:
+ 
   * Include multiple documents in a single `<add>` operations.  Note: there is no advantage in trying to post a huge number of docs in a single go.  I'd suggest going no further than `10` (full-size docs) to `100` (tiny docs).
   * Ensure you are not performing `<commit/>` until you need to see the updated index.
   * If you are reindexing every document in your index, completely removing the index first can substantially speed up the required time and disk space.
   * Solr can do some, but not all, parts of indexing in parallel.  Indexing on multiple threads can be a boon, particularly if you have multiple cpus and your analysis requirements are considerable.
-  * Experiment with different `mergeFactor` and `maxBufferedDocs` settings (see [[http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html]]).
+  * Experiment with different `mergeFactor` and `maxBufferedDocs` settings (see http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html).
  
  == How can I speed up facet counts? ==
- 
  Performance problems can arise when faceting on fields/queries with many unique values.  If you are faceting on a tokenized field, consider making it untokenized (field class `solr.StrField`, or using `solr.KeywordTokenizerFactory`).
  
  Also, keep in mind that Solr must construct a filter for every unique value on which you request faceting.  This only has to be done once, and the results are stored in the `filterCache`.  If you are experiencing slow faceting, check the cache statistics for the `filterCache` in the Solr admin.  If there is a large number of cache misses and evictions, try increasing the capacity of the `filterCache`.
  
  == What does "PERFORMANCE WARNING: Overlapping onDeckSearchers=X" mean in my logs? ==
- 
  This warning means that at least one searcher hadn't yet finished warming in the background, when a commit was issued and another searcher started warming.  This can not only eat up a lot of ram (as multiple on deck searches warm caches simultaneously) but it can can create a feedback cycle, since the more searchers warming in parallel means each searcher might take longer to warm.
  
  Typically the way to avoid this error is to either reduce the frequency of commits, or reduce the amount of warming a searcher does while it's on deck (by reducing the work in newSearcher listeners, and/or reducing the autowarmCount on your caches)
  
  See also the `<maxWarmingSearchers/>` option in SolrConfigXml.
  
- 
  = Developing =
- 
  == Where can I find the latest and Greatest Code? ==
- 
  In the [[http://lucene.apache.org/solr/version_control.html|Solr Version Control Repository]].
  
  == Where can I get the javadocs for the classes? ==
- 
  There are currently [[http://lucene.apache.org/solr/docs/api/|nightly Solr javadocs]]
  
  == How can I help? ==
- 
  Joining and participating in discussion on the [[http://lucene.apache.org/solr/mailing_lists.html|developers email list]] is the best way to get your feet wet with Solr development.
  
- There is also a TaskList containing all of the ideas people have had about ways to improve Solr.  Feel free to add your own ideas to this page, or investigate possible implementations of existing ideas.  When you are ready, [[HowToContribute| submit a patch]] with your changes.
+ There is also a TaskList containing all of the ideas people have had about ways to improve Solr.  Feel free to add your own ideas to this page, or investigate possible implementations of existing ideas.  When you are ready, [[HowToContribute|submit a patch]] with your changes.
  
  == How can I submit bug reports, bug fixes or new features? ==
- 
- Bug reports, and [[HowToContribute| patch submissions]] should be entered in [[http://lucene.apache.org/solr/issue_tracking.html|Solr's Bug Tracking Queue]].
+ Bug reports, and [[HowToContribute|patch submissions]] should be entered in [[http://lucene.apache.org/solr/issue_tracking.html|Solr's Bug Tracking Queue]].
  
  == How do I apply patches from JIRA issues? ==
- 
- Information about testing patches can be found on the [[HowToContribute#TestingPatches| How To Contribute]] wiki page
+ Information about testing patches can be found on the [[HowToContribute#TestingPatches|How To Contribute]] wiki page
  
  == I can't compile Solr, ant says "JUnit not found" or "Could not create task or type of type: junit" ==
- 
  As of September 21, 2007, JUnit's JAR is now included in Solr's source repository, so there is no need to install it separately to run Solr's unit tests.  If ant generates a warning that it doesn't understand the junit task, check that you have an "ant-junit.jar" in your ANT_LIB directory (it should be included when you install apache-ant).
  
  If you are attempting to compile the Solr source tree from prior to September 21, 2007 (including [[Solr1.2]]) you will need to include the junit.jar in your ant classpath.  Please see the [[http://ant.apache.org/manual/OptionalTasks/junit.html|Ant documentation of JUnit]] for notes about where Ant expects to find the JUnit JAR and Ant task JARs.
  
  == How can I start the example application in Debug mode? ==
- 
- You can start the example application in debug mode to debug your java class with your favorite IDE (like eclipse). 
+ You can start the example application in debug mode to debug your java class with your favorite IDE (like eclipse).
+ 
  {{{
  java -Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n -jar start.jar
  }}}
  Then connect to port 8000 and debug.
  
  == Tagging using SOLR ==
- There is a wiki page on some brainstorming on how to implement  
+ There is a wiki page on some brainstorming on how to implement   tagging within Solr [UserTagDesign].
- tagging within Solr [UserTagDesign].