You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ahammad <ah...@gmail.com> on 2009/07/07 16:49:22 UTC

Question regarding ExtractingRequestHandler

Hello,

I've recently started using this handler to index MS Word and PDF files.
When I set ext.extract.only=true, I get back all the metadata that is
associated with that file.

If I want to index, I need to set ext.extract.only=false. If I want to index
all that metadata along with the contents, what inputs do I need to pass to
the http request? Do I have to specifically define all the fields in the
schema or can Solr dynamically generate those fields?

Thanks.
-- 
View this message in context: http://www.nabble.com/Question-regarding-ExtractingRequestHandler-tp24374393p24374393.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question regarding ExtractingRequestHandler

Posted by Grant Ingersoll <gs...@apache.org>.
For metadata, you can add the ext.metadata.prefix field and then use a  
dynamic field that maps that prefix, such as:

&ext.metadata.prefix=metadata_

  <dynamicField name="metadata_*"  type="string"    indexed="true"   
stored="true"/>


Note, some of this is currently under review to be changed.  See https://issues.apache.org/jira/browse/SOLR-284

-Grant

On Jul 7, 2009, at 10:49 AM, ahammad wrote:

>
> Hello,
>
> I've recently started using this handler to index MS Word and PDF  
> files.
> When I set ext.extract.only=true, I get back all the metadata that is
> associated with that file.
>
> If I want to index, I need to set ext.extract.only=false. If I want  
> to index
> all that metadata along with the contents, what inputs do I need to  
> pass to
> the http request? Do I have to specifically define all the fields in  
> the
> schema or can Solr dynamically generate those fields?
>
> Thanks.
> -- 
> View this message in context: http://www.nabble.com/Question-regarding-ExtractingRequestHandler-tp24374393p24374393.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search