You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2008/12/06 08:40:09 UTC

[Solr Wiki] Update of "DIHCustomTransformer" by NoblePaul

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DIHCustomTransformer

The comment on the change is:
Moved from DataImportHandler Page

New page:
= Writing Custom Transformers =
If you need any kind of custom processing before sending the row to Solr, you can write a transformer of your own. Let us take an example use-case. Suppose, you have a single-valued field named "artistName" in your schema which is of type="string" which you want to facet upon and therefore no index-time analysis should be done on this field. The value can contain multiple words like "Celine Dion" but there's a problem, your data contains extra leading and trailing whitespace which you want to remove. The !WhitespaceAnalyzer in Solr can't be applied since you don't want to tokenize the data into multiple tokens. A solution is to write a !TrimTransformer.

== A Simple TrimTransformer ==
{{{
package foo;
public class TrimTransformer	{
	public Object transformRow(Map<String, Object> row)	{
		String artist = row.get("artist");
		if (artist != null)		
			row.put("ar", artist.trim());

		return row;
	}
}
}}}
No need to extend any class. Just write any class which has a method named transformRow with the above signature and DataImportHandler will instantiate it and call the transformRow method using reflection. You will specify it in your data-config.xml as follows:
{{{
<entity name="artist" query="..." transformer="foo.TrimTransformer">
	<field column="artistName" />
</entity>
}}}

== A General TrimTransformer ==
Suppose you want to write a general !TrimTransformer without hardcoding the column on which it needs to operate. Now we'd need to have a flag on the field in data-config.xml to indicate that the !TrimTransformer should apply itself on this field.
{{{
<entity name="artist" query="..." transformer="foo.TrimTransformer">
	<field column="artistName" trim="true" />
</entity>
}}}
Now you'll need to extend the [#transformer Transformer] abstract class and use the API methods in Context to get the list of fields in the entity and get attributes of the fields to detect if the flag is set.
{{{
package foo;
public class TrimTransformer extends Transformer	{

	public Map<String, Object> transformRow(Map<String, Object> row, Context context) {
		List<Map<String, String>> fields = context.getAllEntityFields();

		for (Map<String, String> field : fields) {
			// Check if this field has trim="true" specified in the data-config.xml
			String trim = field.get("trim");
			if ("true".equals(trim))	{
				// Apply trim on this field
				String columnName = field.get("column");
				// Get this field's value from the current row
				String value = row.get(columnName);
				// Trim and put the updated value back in the current row
				if (value != null)
					row.put(columnName, value.trim());
			}
		}

		return row;
	}

}
}}}
If the field is multi-valued, then the value returned is a List instead of a single object and would need to handl appropriately. You'll need to add the jar for !DataImportHandler to your project as a dependency to use the Transformer and Context abstract classes.