You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/02/10 16:20:41 UTC

[Solr Wiki] Update of "ExtractingRequestHandler" by GrantIngersoll

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by GrantIngersoll:
http://wiki.apache.org/solr/ExtractingRequestHandler

------------------------------------------------------------------------------
   * A default field name is required for indexing, but not for extraction only.
   * The default field name and any literal values are not mapped.  They can be boosted.  See the examples.
  
- == Identifiers ==
- 
- If you do not pass in a value for a unique ID field, and your schema requires one, the !SolrContentHandler will attempt to generate an ID for you.  The code for this looks like:
- {{{
-   protected String generateId(SchemaField uniqueField) {
-     //we don't have a unique field specified, so let's add one
-     String uniqId = null;
-     FieldType type = uniqueField.getType();
-     if (type instanceof StrField || type instanceof TextField) {
-       uniqId = metadata.get(ExtractingMetadataConstants.STREAM_NAME);
-       if (uniqId == null) {
-         uniqId = metadata.get(ExtractingMetadataConstants.STREAM_SOURCE_INFO);
-       }
-       if (uniqId == null) {
-         uniqId = metadata.get(Metadata.IDENTIFIER);
-       }
-       if (uniqId == null) {
-         //last chance, just create one
-         uniqId = UUID.randomUUID().toString();
-       }
-     } else if (type instanceof UUIDField){
-       uniqId = UUID.randomUUID().toString();
-     }
-     else {
-       uniqId = String.valueOf(getNextId());
-     }
-     return uniqId;
-   }
- }}}
- 
- NOTE, you can override this by implementing your own !SolrContentHandler as described below.
  
  == When To Use ==
  
@@ -79, +48 @@

  or
  
   * curl http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text  --data-binary @tutorial.html  -H 'Content-type:text/html'  
-        <!> NOTE, this literally streams the file, which does not, then, provide info to Solr about the name of the file, which means the !ExtractingRequestHandler will auto-generate an ID for the file, unless you specify one by adding a literal value (see below).
+        <!> NOTE, this literally streams the file, which does not, then, provide info to Solr about the name of the file.
  
  or whatever other way you know how to do it.  Don't forget to COMMIT!