You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tod <li...@gmail.com> on 2012/01/25 16:41:44 UTC

Indexing Using XML Message

I have a local data store containing a host of different document types. 
  This data store is separate from a remote Solr install making 
streaming not an option.  Instead I'd like to generate an XML file that 
contains all of the documents including content and metadata.

What would be the most appropriate way to accomplish this?  I could use 
the Tika CLI to generate XML but I'm not sure it would work or that its 
the most efficient way to handle things.  Can anyone offer some suggestions?


Thanks - Tod

Re: Indexing Using XML Message

Posted by Erick Erickson <er...@gmail.com>.
So you can't even communicate with the remote Solr process by HTTP?
Because if you can, SolrJ would work.

Otherwise, you're stuck with creating a bunch of Solr-style XML
documents, they have a simple format. See the example/exampleDocs
directory in the standard distribution. You'll have to parse the
separate document types and put your required data into the Solr
XML format...

But I really don't understand why you need to. A Solr installation
that you can't get to via http is pretty useless, although I suppose
there can be security setups that preclude this. Assuming you can
get there via http, consider a SolrJ program combined with Tika to
parse the docs you have in all these formats and send them to Solr
via SolrJ...

Best
Erick

On Wed, Jan 25, 2012 at 7:41 AM, Tod <li...@gmail.com> wrote:
> I have a local data store containing a host of different document types.
>  This data store is separate from a remote Solr install making streaming not
> an option.  Instead I'd like to generate an XML file that contains all of
> the documents including content and metadata.
>
> What would be the most appropriate way to accomplish this?  I could use the
> Tika CLI to generate XML but I'm not sure it would work or that its the most
> efficient way to handle things.  Can anyone offer some suggestions?
>
>
> Thanks - Tod