You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2008/10/24 15:52:57 UTC

[Solr Wiki] Update of "DataImportHandler" by GistoLero

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by GistoLero:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
Comments from solr-user mail-list added

------------------------------------------------------------------------------
  
   * '''full-import''' : Full Import operation can be started by hitting the URL `http://<host>:<port>/solr/dataimport?command=full-import`
    * This operation will be started in a new thread and the ''status'' attribute in the response should be shown ''busy'' now. 
-   * The operation may take some time depending on size of dataset
+   * The operation may take some time depending on size of dataset.
    * When full-import command is executed, it stores the start time of the operation in a file located at ''conf/dataimport.properties''
    * This stored timestamp is used when a delta-import operation is executed.
+   * Queries to Solr are not blocked during full-imports.
    * It takes in extra parameters
     * '''clean''' : (default 'true'). Tells whether to clean up the index before the indexing is started
     * '''commit''': (default 'true'). Tells whether to commit after the operation
@@ -414, +415 @@

  }}}
  
  Time taken was around 2 hours 40 minutes to index 7278241 articles with peak memory usage at around 4GB.
-  
+ 
+ == Using delta-import command ==
+ The only EntityProcessor which supports delta is !SqlEntityProcessor! The X!PathEntityProcessor has not implemented it yet. So, unfortunately, there is no delta support for XML at this thime.
+ If you want to implement those methods in X!PathEntityProcessor: The methods are explained in !EntityProcessor.java.
+ 
  = Extending the tool with APIs =
  The examples we explored are admittedly, trivial . It is not possible to have all user needs met by an xml configuration alone. So we expose a few abstract class which can be implemented by the user to enhance the functionality.
  
@@ -631, +636 @@

   * '''`excludes`''' : A Regex pattern of excluded file names
   * '''`newerThan`''' : A date param . Use the format (`yyyy-MM-dd HH:mm:ss`) . It can also be a datemath string eg: ('NOW-3DAYS'). The single quote is necessary . Or it can be a valid variableresolver format like (${var.name})
   * '''`olderThan`''' : A date param . Same rules as above
+  * '''`rootEntity=false`''' : An entity directly under the <document> is a root entity. That means that for each row emitted by the root entity one document is created in Solr/Lucene. But as in this case we do not wish to make one document per file. We wish to make one document per row emitted by the following entity 'x': Because the entity 'f' has rootEntity=false the entity directly under it becomes a root entity automatically and each row emitted by that becomes a document.
+ * '''`dataSource=null`''' : In most of the cases there is only one DataSource (A JdbcDataSource) and all entities just use them. In case of FileListEntityProcessor a datasource is not necessary. It just means that we won't create a DataSource instance for that.
   example:
  {{{
  <dataConfig>