You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jun Zhou <AC...@sheffield.ac.uk> on 2002/07/26 16:52:00 UTC
index other document types
Dear all,
I learned from Lucene FAQ that if we want to index other document types, we need to provide a parser or extractor for every document type. I know there are some tools available which can convert other document types to txt format. Is the converter a parser or extractor at all?
Thank you for your kind assistance in advance.
Best regards
Jun Zhou
acp01jz@sheffield.ac.uk
Re: index other document types
Posted by Jun Zhou <AC...@sheffield.ac.uk>.
Thank you very much, Dave! So I am sure I can choose Lucene to work on my project now.
Best regards
Jun Zhou
ACP01JZ@sheffield.ac.uk
----- Original Message -----
From: "Dave Peixotto" <pe...@geofolio.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Friday, July 26, 2002 4:34 PM
Subject: Re: index other document types
> Lucene is very good at indexing and searching text documents. If you need
> to index other types of documents (Word docs, PDFs, etc.) then a good
> strategy is to convert those documents to text and use Lucene to index the
> text version of the document. If you already have a tool to convert other
> document types to text, then you should have no trouble indexing those
> documents.
>
> ----- Original Message -----
> From: "Jun Zhou" <AC...@sheffield.ac.uk>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Friday, July 26, 2002 7:52 AM
> Subject: index other document types
>
>
> > Dear all,
> >
> > I learned from Lucene FAQ that if we want to index other document types,
> we need to provide a parser or extractor for every document type. I know
> there are some tools available which can convert other document types to txt
> format. Is the converter a parser or extractor at all?
> >
> > Thank you for your kind assistance in advance.
> >
> > Best regards
> > Jun Zhou
> > acp01jz@sheffield.ac.uk
> >
>
>
> --
> To unsubscribe, e-mail: <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
>
Re: index other document types
Posted by Dave Peixotto <pe...@geofolio.com>.
Lucene is very good at indexing and searching text documents. If you need
to index other types of documents (Word docs, PDFs, etc.) then a good
strategy is to convert those documents to text and use Lucene to index the
text version of the document. If you already have a tool to convert other
document types to text, then you should have no trouble indexing those
documents.
----- Original Message -----
From: "Jun Zhou" <AC...@sheffield.ac.uk>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Friday, July 26, 2002 7:52 AM
Subject: index other document types
> Dear all,
>
> I learned from Lucene FAQ that if we want to index other document types,
we need to provide a parser or extractor for every document type. I know
there are some tools available which can convert other document types to txt
format. Is the converter a parser or extractor at all?
>
> Thank you for your kind assistance in advance.
>
> Best regards
> Jun Zhou
> acp01jz@sheffield.ac.uk
>
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>