You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Kerwin Noronha <ke...@gmail.com> on 2009/11/13 05:42:31 UTC
Indexing multiple documents in Solr/SolrCell
Hi,
I am new to this forum and would like to know if something like the function
described below has been developed or exists in Solr. If it does not exist,
is it a good Idea and can I contribute.
We need to index multiple documents with different formats. So we use Solr
with Tika (Solr Cell).
Question:
Can you index both metadata and content for multiple documents iteratively
in Solr?
For example I have an XML with metadata and a links to the documents
content. There are many documents in this XML and I would like to index them
all without firing multiple URLs.
Example of XML
<add>
<doc>
<field name=id>34122</field>
<field name=author>Michael</field>
<field name=size>3MB</field>
<field name=URL>URL of the document</field>
</doc>
</add>
<doc2>.....</doc2>...</docN>
I need to index all these documents by sending a single URL with this XML
file.The collection of documents to be indexed could be on a file system.
I have altered the Solr code to be able to do this but is there an already
existing feature?
Re: Indexing multiple documents in Solr/SolrCell
Posted by Chris Hostetter <ho...@fucit.org>.
: Question:
: Can you index both metadata and content for multiple documents iteratively
: in Solr?
As i understand it, there is some ongoing work with the DataImportHandler
so support this type of thing -- having entities from one DataSource
refrnece other entities that are then processed using Tika.
you may want to check the open issues regarding DataImportHandler, and/or
search the recent mail archives for tika + DataImportHandler.
-Hoss