You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by javaxmlsoapdev <vi...@yahoo.com> on 2009/11/04 23:42:18 UTC

Index documents with Solr

Wanted to find out how people are using Solr’s ExtractingRequestHandler to
index different types of documents from a configuration file in an import
fashion. I want to use this handler in a similar way how DataImportHandler
works where you can issue “import” command from the URL to create an index
reading database table(s). 

For documents, I have a db table which stores files paths. Want to read
file’s location from a db table then create an index after reading document
content using ExtractingRequestHandler. Again trying to see if all this can
be done just from a configuration same way how DataImportHandler handles
this

-- 
View this message in context: http://old.nabble.com/Index-documents-with-Solr-tp26205991p26205991.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Index documents with Solr

Posted by javaxmlsoapdev <vi...@yahoo.com>.
Glock, did you get this approach to work? let me know. 

Thanks,

Glock, Thomas wrote:
> 
> I have a similar situation but not expecting any easy setup.  Currently
> the tables contain both a url to the file and quite a bit of additional
> metadata about the file.  I'm planning one initial load to Solr by
> creating xml in my own utility which posts the xml.  Data is messy so DIH
> is not a good choice for this situation.  After the initial load (only
> ~12K documents - takes 10 minutes tops); I plan to perform a second pass
> which will use the extractingrequesthandler.  I know how the id will map
> but not clear yet how to get that id to ExtractingRequestHandler. Would be
> good to see different examples on the Wiki. Have not yet had a first
> attempt - hoping to in a day or so.
> 
> 
> -----Original Message-----
> From: javaxmlsoapdev [mailto:vikasdp@yahoo.com]
> Sent: Wed 04-Nov-2009 5:42 PM
> To: solr-user@lucene.apache.org
> Subject: Index documents with Solr
>  
> 
> Wanted to find out how people are using Solr's ExtractingRequestHandler to
> index different types of documents from a configuration file in an import
> fashion. I want to use this handler in a similar way how DataImportHandler
> works where you can issue "import" command from the URL to create an index
> reading database table(s). 
> 
> For documents, I have a db table which stores files paths. Want to read
> file's location from a db table then create an index after reading
> document
> content using ExtractingRequestHandler. Again trying to see if all this
> can
> be done just from a configuration same way how DataImportHandler handles
> this
> 
> -- 
> View this message in context:
> http://old.nabble.com/Index-documents-with-Solr-tp26205991p26205991.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Index-documents-with-Solr-tp26205991p26443551.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Index documents with Solr

Posted by "Glock, Thomas" <th...@pfizer.com>.
I have a similar situation but not expecting any easy setup.  Currently the tables contain both a url to the file and quite a bit of additional metadata about the file.  I'm planning one initial load to Solr by creating xml in my own utility which posts the xml.  Data is messy so DIH is not a good choice for this situation.  After the initial load (only ~12K documents - takes 10 minutes tops); I plan to perform a second pass which will use the extractingrequesthandler.  I know how the id will map but not clear yet how to get that id to ExtractingRequestHandler. Would be good to see different examples on the Wiki. Have not yet had a first attempt - hoping to in a day or so.


-----Original Message-----
From: javaxmlsoapdev [mailto:vikasdp@yahoo.com]
Sent: Wed 04-Nov-2009 5:42 PM
To: solr-user@lucene.apache.org
Subject: Index documents with Solr
 

Wanted to find out how people are using Solr's ExtractingRequestHandler to
index different types of documents from a configuration file in an import
fashion. I want to use this handler in a similar way how DataImportHandler
works where you can issue "import" command from the URL to create an index
reading database table(s). 

For documents, I have a db table which stores files paths. Want to read
file's location from a db table then create an index after reading document
content using ExtractingRequestHandler. Again trying to see if all this can
be done just from a configuration same way how DataImportHandler handles
this

-- 
View this message in context: http://old.nabble.com/Index-documents-with-Solr-tp26205991p26205991.html
Sent from the Solr - User mailing list archive at Nabble.com.