You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tod <li...@gmail.com> on 2012/01/25 16:41:44 UTC
Indexing Using XML Message
I have a local data store containing a host of different document types.
This data store is separate from a remote Solr install making
streaming not an option. Instead I'd like to generate an XML file that
contains all of the documents including content and metadata.
What would be the most appropriate way to accomplish this? I could use
the Tika CLI to generate XML but I'm not sure it would work or that its
the most efficient way to handle things. Can anyone offer some suggestions?
Thanks - Tod
Re: Indexing Using XML Message
Posted by Erick Erickson <er...@gmail.com>.
So you can't even communicate with the remote Solr process by HTTP?
Because if you can, SolrJ would work.
Otherwise, you're stuck with creating a bunch of Solr-style XML
documents, they have a simple format. See the example/exampleDocs
directory in the standard distribution. You'll have to parse the
separate document types and put your required data into the Solr
XML format...
But I really don't understand why you need to. A Solr installation
that you can't get to via http is pretty useless, although I suppose
there can be security setups that preclude this. Assuming you can
get there via http, consider a SolrJ program combined with Tika to
parse the docs you have in all these formats and send them to Solr
via SolrJ...
Best
Erick
On Wed, Jan 25, 2012 at 7:41 AM, Tod <li...@gmail.com> wrote:
> I have a local data store containing a host of different document types.
> This data store is separate from a remote Solr install making streaming not
> an option. Instead I'd like to generate an XML file that contains all of
> the documents including content and metadata.
>
> What would be the most appropriate way to accomplish this? I could use the
> Tika CLI to generate XML but I'm not sure it would work or that its the most
> efficient way to handle things. Can anyone offer some suggestions?
>
>
> Thanks - Tod