You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Kerwin Noronha <ke...@gmail.com> on 2009/11/13 05:42:31 UTC

Indexing multiple documents in Solr/SolrCell

Hi,

I am new to this forum and would like to know if something like the function
described below has been developed or exists in Solr. If it does not exist,
is it a good Idea and can I contribute.

We need to index multiple documents with different formats. So we use Solr
with Tika (Solr Cell).

Question:
Can you index both metadata and content for multiple documents iteratively
in Solr?
For example I have an XML with metadata and a links to the documents
content. There are many documents in this XML and I would like to index them
all without firing multiple URLs.

Example of XML
<add>
<doc>
<field name=id>34122</field>
 <field name=author>Michael</field>
<field name=size>3MB</field>
<field name=URL>URL of the document</field>
</doc>
</add>
<doc2>.....</doc2>...</docN>

I need to index all these documents by sending a single URL with this XML
file.The collection of documents to be indexed could be on a file system.

I have altered the Solr code to be able to do this but is there an already
existing feature?

Re: Indexing multiple documents in Solr/SolrCell

Posted by Chris Hostetter <ho...@fucit.org>.
: Question:
: Can you index both metadata and content for multiple documents iteratively
: in Solr?

As i understand it, there is some ongoing work with the DataImportHandler 
so support this type of thing -- having entities from one DataSource 
refrnece other entities that are then processed using Tika.

you may want to check the open issues regarding DataImportHandler, and/or 
search the recent mail archives for tika + DataImportHandler.


-Hoss