You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Suresh Kannan <su...@indiapages.com> on 2007/04/06 15:48:07 UTC
Posting PDF,DOC,TXT
I would like to post PDF, DOC, TXT into SOLR to do the indexing.
Suresh
RE: Posting PDF,DOC,TXT - Hijacked thread
Posted by Bill Tantzen <ta...@tc.umn.edu>.
Thank you - I don't know how I missed that!
Bill Tantzen
University of Minnesota Libraries
tantz001@tc.umn.edu
612-626-9949 (office) 612-325-1777 (cell)
________________________________________________________________
I guess the man's a genius, but what
a dirty mind he has, hasn't he? -- Nora Joyce
> -----Original Message-----
> From: Greg Ludington [mailto:gludington@gmail.com]
> Sent: Friday, April 06, 2007 12:04 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Posting PDF,DOC,TXT - Hijacked thread
>
> This page on the wiki is probably your best place to start:
>
> http://wiki.apache.org/solr/UpdateXmlMessages
>
> -Greg
>
> On 4/6/07, Bill Tantzen <ta...@tc.umn.edu> wrote:
> > >
> > > There's no way to do that directly at the moment, you'll need to
> > > convert them to the XML format that Solr expects.
> > >
> >
> > Would someone be willing to point me to a resource that
> describes this
> > format?
> >
> > Cheers!
> > Bill
> >
> > Bill Tantzen
> > University of Minnesota Libraries
> > tantz001@tc.umn.edu
> > 612-626-9949 (office) 612-325-1777 (cell)
> > ________________________________________________________________
> >
> > I guess the man's a genius, but what
> > a dirty mind he has, hasn't he? -- Nora Joyce
> >
> >
>
Re: Posting PDF,DOC,TXT - Hijacked thread
Posted by Greg Ludington <gl...@gmail.com>.
This page on the wiki is probably your best place to start:
http://wiki.apache.org/solr/UpdateXmlMessages
-Greg
On 4/6/07, Bill Tantzen <ta...@tc.umn.edu> wrote:
> >
> > There's no way to do that directly at the moment, you'll need
> > to convert them to the XML format that Solr expects.
> >
>
> Would someone be willing to point me to a resource that describes this
> format?
>
> Cheers!
> Bill
>
> Bill Tantzen
> University of Minnesota Libraries
> tantz001@tc.umn.edu
> 612-626-9949 (office) 612-325-1777 (cell)
> ________________________________________________________________
>
> I guess the man's a genius, but what
> a dirty mind he has, hasn't he? -- Nora Joyce
>
>
RE: Posting PDF,DOC,TXT - Hijacked thread
Posted by Bill Tantzen <ta...@tc.umn.edu>.
>
> There's no way to do that directly at the moment, you'll need
> to convert them to the XML format that Solr expects.
>
Would someone be willing to point me to a resource that describes this
format?
Cheers!
Bill
Bill Tantzen
University of Minnesota Libraries
tantz001@tc.umn.edu
612-626-9949 (office) 612-325-1777 (cell)
________________________________________________________________
I guess the man's a genius, but what
a dirty mind he has, hasn't he? -- Nora Joyce
Re: Posting PDF,DOC,TXT
Posted by Bertrand Delacretaz <bd...@apache.org>.
On 4/6/07, Suresh Kannan <su...@indiapages.com> wrote:
> I would like to post PDF, DOC, TXT into SOLR to do the indexing.
There's no way to do that directly at the moment, you'll need to
convert them to the XML format that Solr expects.
The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ lists a
number of tools that can help extract content and metadata from
various formats.
-Bertrand