You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Suresh Kannan <su...@indiapages.com> on 2007/04/06 15:48:07 UTC

Posting PDF,DOC,TXT

I would like to post PDF, DOC, TXT into SOLR to do the indexing. 

Suresh

RE: Posting PDF,DOC,TXT - Hijacked thread

Posted by Bill Tantzen <ta...@tc.umn.edu>.
Thank you - I don't know how I missed that!

Bill Tantzen
University of Minnesota Libraries
tantz001@tc.umn.edu
612-626-9949 (office)  612-325-1777 (cell) 
________________________________________________________________

I guess the man's a genius, but what
a dirty mind he has, hasn't he? -- Nora Joyce 

> -----Original Message-----
> From: Greg Ludington [mailto:gludington@gmail.com] 
> Sent: Friday, April 06, 2007 12:04 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Posting PDF,DOC,TXT - Hijacked thread
> 
> This page on the wiki is probably your best place to start:
> 
> http://wiki.apache.org/solr/UpdateXmlMessages
> 
> -Greg
> 
> On 4/6/07, Bill Tantzen <ta...@tc.umn.edu> wrote:
> > >
> > > There's no way to do that directly at the moment, you'll need to 
> > > convert them to the XML format that Solr expects.
> > >
> >
> > Would someone be willing to point me to a resource that 
> describes this 
> > format?
> >
> > Cheers!
> > Bill
> >
> > Bill Tantzen
> > University of Minnesota Libraries
> > tantz001@tc.umn.edu
> > 612-626-9949 (office)  612-325-1777 (cell) 
> > ________________________________________________________________
> >
> > I guess the man's a genius, but what
> > a dirty mind he has, hasn't he? -- Nora Joyce
> >
> >
> 


Re: Posting PDF,DOC,TXT - Hijacked thread

Posted by Greg Ludington <gl...@gmail.com>.
This page on the wiki is probably your best place to start:

http://wiki.apache.org/solr/UpdateXmlMessages

-Greg

On 4/6/07, Bill Tantzen <ta...@tc.umn.edu> wrote:
> >
> > There's no way to do that directly at the moment, you'll need
> > to convert them to the XML format that Solr expects.
> >
>
> Would someone be willing to point me to a resource that describes this
> format?
>
> Cheers!
> Bill
>
> Bill Tantzen
> University of Minnesota Libraries
> tantz001@tc.umn.edu
> 612-626-9949 (office)  612-325-1777 (cell)
> ________________________________________________________________
>
> I guess the man's a genius, but what
> a dirty mind he has, hasn't he? -- Nora Joyce
>
>

RE: Posting PDF,DOC,TXT - Hijacked thread

Posted by Bill Tantzen <ta...@tc.umn.edu>.
> 
> There's no way to do that directly at the moment, you'll need 
> to convert them to the XML format that Solr expects.
> 

Would someone be willing to point me to a resource that describes this
format?

Cheers!
Bill

Bill Tantzen
University of Minnesota Libraries
tantz001@tc.umn.edu
612-626-9949 (office)  612-325-1777 (cell) 
________________________________________________________________

I guess the man's a genius, but what
a dirty mind he has, hasn't he? -- Nora Joyce 


Re: Posting PDF,DOC,TXT

Posted by Bertrand Delacretaz <bd...@apache.org>.
On 4/6/07, Suresh Kannan <su...@indiapages.com> wrote:
> I would like to post PDF, DOC, TXT into SOLR to do the indexing.

There's no way to do that directly at the moment, you'll need to
convert them to the XML format that Solr expects.

The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ lists a
number of tools that can help extract content and metadata from
various formats.

-Bertrand