You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joe Kessel <is...@hotmail.com> on 2009/08/25 02:45:51 UTC

Multiple field / pdf file per document

New to Solr, not so new to search.  I have an existing data model that I am pushing into a Solr index.  For example, I am indexing a product which includes product brochures in multiple locales.  So this single Solr document contains multiple text fields which require linguistics analyzers.  The data for these text fields comes from multiple pdf files.  As i am currently supporting 4 locales, I will have a different pdf file for each locale.    In addition I have a number of other fields that are used by the application. Solr will be returning a reference used by the application to determine what data and pdf to display.  With the Extraction Request handler I don't see how I would be able to support multiple pdf files.  I'm am planning to use Solr 1.4 for this project.

 

My assumption is that I will need to do the pdf parsing prior to sending the document to Solr.  Is there a way to do the extraction at the field level?  I want to make sure I am not missing something in Solr Cel before I invest the effort to parse the documents on the client side.

 

Thanks

_________________________________________________________________
Windows Live: Make it easier for your friends to see what you’re up to on Facebook.
http://windowslive.com/Campaign/SocialNetworking?ocid=PID23285::T:WLMTAGL:ON:WL:en-US:SI_SB_facebook:082009