You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2008/10/02 21:04:47 UTC

[Solr Wiki] Trivial Update of "DataImportHandler" by GrantIngersoll

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by GrantIngersoll:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  [[Anchor(entityprocessor)]]
  == EntityProcessor ==
  Each entity is handled by a default Entity processor called !SqlEntityProcessor. This works well for systems which use RDBMS as a datasource. For other kind of datasources like  REST or Non Sql datasources you can choose to extend this abstract class `org.apache.solr.handler.dataimport.Entityprocessor`. This is designed to Stream rows one by one from an entity. The simplest way to implement your own !EntityProcessor is to extend !EntityProcessorBase and override the `public Map<String,Object> nextRow()` method.
- '!EntityProcessor' rely on the !DataSource for fetching data. The return type of the !DataSource is important for an !EntityProcessor. The in-built ones are,
+ '!EntityProcessor' rely on the !DataSource for fetching data. The return type of the !DataSource is important for an !EntityProcessor. The built-in ones are,
  === SqlEntityProcessor ===
  This is the defaut. The !DataSource must be of type `DataSourec<Iterator<Map<String, Object>>` . !JdbcDataSource can be used with this.
  === XPathEntityProcessor ===
@@ -784, +784 @@

  == Field declarations ==
  Fields declared in the <entity> tags help us provide extra information which cannot be derived automatically. The tool relies on the 'column' values to fetch values from the results. The fields you explicitly add in the configuration are equivalent to the fields which are present in the solr schema.xml (implicit fields). It automatically inherits all the attributes present in the schema.xml. Just that you cannot add extra configuration. Add the field entries when,
   * The fields emitted from the !EntityProcessor has a different name than the field in schema.xml
-  * With in-built transformers . They expect extra information to decide which fields to process and how to process
+  * With built-in transformers . They expect extra information to decide which fields to process and how to process
   * X!PathEntityprocessor or any other processors which explicitly demand extra information in each fields
  == What is a row? ==
  A row in !DataImportHandler is a Map (Map<String, Object). In the map , the key is the name of the field and the value can be anything which is a valid Solr type. The value can also be a Collection of the valid Solr types (this may get mapped to a multi-valued field). If the !DataSource is RDBMS a query cannot emit a multivalued field. But it is possible to create a multivalued field by joining an entity with another.i.e if the sub-entity returns multiple rows for one row from parent entity it can go into a multivalued field. If the datasource is xml, it is possible to return a multivalued field.